COMPFI'A'rlONAL PHONOLOGY: MERGED, NOT MIXED 
Egon Berendsen, Department of Phonetics, Un:iversity of Utrecht, The Netherlands 
Simone Langeweg, Phonetics Laboratory, University of Leyden, The Netherlands 
IIugo van Leeuwen, Institute of Perception Research, Eindhoven, The Netherlands 
O. Introduction 
Research into text-to-speech systems has become a 
rather important topic in the areas of linguistics 
and phonetics. Particularly for English, several 
text-to-speech systems have been established (cf. for' 
example llertz (1982), Klatt (1976)). For Dutch, 
text-to-speech systems are being developed at the 
University of Nijmegen (cf. Wester (1984)) and at the 
Universities of Utrecht and Leyden and the Institute 
of Perception Research (IPO) Eindhoven as we\]\]. In 
this paper we will be concerned with the 
grapheme-to-phoneme conversion component as part of 
the Dutch text-to-speech system which is being de- 
veloped in Utrecht, Leyden and Eindhoven. 
One of our primary interests is that the 
grapheme-to-phoneme system not only has to generate 
the input for speech synthesis, either in allophone 
or diphone form, but that it had to be used for other ~ 
urposes as well. Thus, the system has to satisfy the 
-ollowing demands: 
- its output must form a proper and flexible 
input for diphone as well as allophone synthesis; 
- it must be possible to easily generate 
phonematized lists on the basis of orthographic 
input; 
- it must be possib\].e to automatically obtain 
information regarding the relation between 
graphemes and phonemes in texts; 
- the system has to be user-friendly, so that it 
can be addressed by linguists without computer 
training (for example to test their phonological 
rules). 
In our view, there are two aspects to a 
grapheme-to-phoneme conversion system: a linguistic 
and a computational one. The linguist, in fact, pro- 
vides the grammar necessary for the conversion and 
the engineer implements this grammar into a computer 
system. Thus, knowledge about spelling and linguis- 
tics are separated from the technical implementation: 
the linguist provides the rules and the system exe- 
cutes them. The two components will also constitute 
the main sections of this paper. 
I. Linguistics 
For grapheme-to-phoneme conversion it is expedient to 
assume several modules. In the first module, 
'difficult' elements like numbers, acronyms and ab- 
breviations, have to be changed into their cor- 
responding full graphemic notation. Next, one has to 
recover units from the spelling which influence gra- 
pheme-to-phoneme conversion: for example, words form- 
lag compounds are written as one uninterrupted string 
in Dutch, but have to be recovered because they in- 
fluence graphemic conversion and stress assignment. 
The third and most important module concerns the 
rules which assign phonemes to graphemes. We then 
have phoneme information, and can establish further 
relevant units on the basis of this information. Fi- 
nally, phonological processes have to be accounted 
for in modules for stress assignment and segmental 
phenomena. 
I.i. Phoneme-to-grapheme assignment 
Our starting point for the development of the gra- 
pheme-to-phoneme system is that graphemes and 
phonemes are different entities, which should be 
represented at separate levels. As graphemes form the 
input for the conversion, the grapheme level is 
filled from the start. The derivation of phonemes is 
performed in the following way: to each grapheme or 
group of graphemes a corresponding phoneme is as- 
signed at the phoneme level. This is represented in 
(i), where lower case letters indicate graphemes and 
capitals indicate phonemes. 
612 
(\]) grapheme level: a b c d e f g 
phonemelovel: f 
Notice that the assumption of two levels, one for 
graphemes and one for phonemes, makes it possible to 
obtain information about the relation between gra- 
phemes and phonemes. As we will see below this as- 
sumption has some other attractive consequences. 
Notice furthermore, that to each grapheme or se- 
quence of graphemes a phoneme has to be assigned, ex- 
cept in cases where there is no corresponding 
phoneme. For the non-linguist, it may come as a sur- 
prise that a great number of regularties and subregu- 
larities in correspondance between graphemes an(\[ 
phonemes can be found. The assignment of phonemes to 
graphemes is therefore (lone by rule. Of course, there 
are always words that cannot be captured by the 
rules: these will have to be enumerated. The order of 
application will then be enumeration, sub-rules and 
rules. The string of graphemes J s scanned sequential- 
ly: first a phoneme is assigned to the initial gra- 
pheme, then this is repented for the second grapheme 
and so on. This procedure works very quickly since, 
if the grapheme under consideration is all a, only the 
rules assigning phonemes to a will have to be consi- 
dered. The rule format used here, is very similar to 
the well-known format of Chomsky and Halle (1968) 
(=SPE). Some further mechanisms are added to their 
rule format: 
- it is possible to negate elements or groups of 
elements in the environment of the rule; 
- one can use so-called global rules, referring 
hack to information avail.able earlier in the 
derivation because we use a two-level approach; 
- it is possible to use definitions instead ot 
repeatedly used sequences, in the rule 
environment, such as sequences indicating 
syllable boundaries. 
Below, we will demonstrate how our linguistic 
knowledge can be expressed in rules and sub-rules. 
These rules will not cover all eases, but they will 
serve as an illustration, h rule completely written 
in the standard SPE format may be as follows. 
(2) c,h -> SJ / i,n,{<-cons>} 
- {c } 
This rule assigns the phoneme SJ/\[~\] to the letters 
ch, if the latter is followed by the letters in fol- 
lowed by either a vowel or c, thus accounting for 
chinchilla (chinchilla) and China (China). As one can 
see from (2) no "long" braces are used to indicate 
several alternatives, but each alternative is sur- 
rounded by short braces, since it is impossible to 
use "long" braces in the computer. Furthermore, pho- 
nological features are surrounded by angled brackets. 
The possibility of negating an element is illus- 
trated in (3). 
(3) e,h -> SJ / i e,'l 
The grapheme preceded by a quote in (3) is negated. 
This means that every sequence consisting of ch pre- 
ceded by i and followed by e which in turn may not he 
followed by i, is assigned the phoneme SJ. Thus we 
account for the alternation in ch-pronunciation ia 
fiche (chip) where ch is SJ and richel (sill) where 
ch is X/\[x\]. 
As a last example, a rule is given in which only 
phoneme information has a triggering effect. 
(4) au -> O0 / SJ 
To the graphemic sequence au the phoneme 00/\[o:\] is 
assigned if it is preceded by the phoneme SJ as is 
the case in chauvinisme (chauvinism) where ch is SJ. 
Notice furthermore, that one can only use phoneme in- 
formation in the lefthand env:i.ronmenl: of rules as- 
signing phonemes to graphemos, since the string is 
scanned from left to right and grapheme I/y gral)heme. 
k further addJ lieu to the SPE format which has 
been developed is the possibility to require two 
th:ings at the same t:i.lne Jn the environment of rules: 
so-called coordinations. The ueed for such ai1 option 
is .i\].\]tlstrated by the following example. Using scan- 
dard SPE rules, we have to postulate l;ho two disjunc- 
tively ordered rules in (5) plus tile rule Jn (6) t:o 
aecot, I,t for the wlriat-ion in pronuaciatJnll of 17:illal e 
in Betel (e=E/\[~.\]) (trade mark) aad ade\] (e=@/\[3\]) 
(nobility). Rul.e (6) is independently motivated by 
cases such as berg (mountain). Rule (5a) takes pre- 
cedence over rule (Sb) an(l (Sb) over (6). This means 
that if (5a) has been applied both (5b) and (6) will 
not be applied, although the:Jr requirements are met. 
Notice that Jn (6) the possibility to use def~n:itions 
:ks illustrated. 
(5)a e -> E / c \].,<-segm> 
l} e ~> @ / vet,CONS _ 71 ,<-segin> 
(6) e -> \],: / s(~ 
where SG ~onstitutes tile dolki.nition \[or 
a sequence of consonants whiich follow a 
vowel Ja a closed syll.ab\]e 
The derivation of the examples n,entioned al)ove is as 
follows. 
t\];E A g 
(Sb) (6) 
However, Jf we have the possibility to rise coordii,a- 
tious, we only have to state rule (8) which takes 
precedence over rule (6). The + symbols placed 
beneath each other indicate COol(hi.nat:ion. 
(8) e -> @ / VOC,+CONS 1, <-segm> 
.k~c -- 
The derivation (7) now turns into (9). 
IA; l 
(6) '; 
1..2. Other modules 
Until this point in the paper, we have concentrated 
solely on grapheme-to-phoneme conversion proper, but 
as we al.ready stated above, the block of rules as- 
signing phonemes to graphemes is surrounded by other 
rule modules with different: functions. We will\] deal. 
with some of these modules here, disregarding abbre- 
viations, acronyms, numbers, and phonologJ ca\]. 
processes above the word level. 
Since Dutch is a stress languag, e, one aspect of 
the grapheme-to-phoneme conversion mast be the as- 
signment: of word stress. The question is then how 
word stress mnst be assigned. Dntch word stress is 
not fixed, i.e. always assigned to the same syllable 
position, but lexical, i.e. the position may vac.i\]- 
late Icf. k~lium (potassiunO, kabofter (imp), 
kapiteln (captain)). The rules then must refer to mor- 
phological and/or syllable structure. 
Syllable weight is decisive in stress assigament 
in monomorphemic words. For compounds, however, mor- 
phological structure has to be recognized. By making 
reference to sequences of vowels and consonants, the 
syllable weight call be defined. As is well-known from 
SPE, stress rules in monomorphemic words are disjunc- 
tively ordered (cf. (i0)). For our system, tile impli- 
cation is that the whole input string (for these 
stress rules) has to be scanned for each sub-stress 
ruleo Furthermore, <-st(less)> has to be present as a 
feature of the vowels in tile righthaud environment of 
the rule. in most cases, compouud stress is assigned 
to the first word of the compound. Within the 
SPE-format, compound stress assignment leads to 
stress lowering of the stresses that have already 
been assigned. Our rule system, however, ass:igns 
secondary stress :iu monomorphemJc words which :is then 
raised to primary stress I)y an additional rule is 
both monon~orphemic word~ and the \]Jrst part of a com- 
pound (Of. (\]1)). \]'he number follow:lag a featare in- 
dicates that th:is item n)ust be \[)resent at :\[east that 
number of t:imes. 
(\]0) VC =+<+voc> ! So @ cannot be stressed +'@ 
a) <+voc,+long> -> <2st> / CONS,# 
VC -> <2st> / CONS2,# - 
VC -> <2st> / ~:',CONSO CONSO,# 
l/) VC -> <2st> / ... 
... _ CONSO,+II ,CONSO,+<+voc,-st>,CONSO,# 
-I-<--s t> -I-' \] I -I-t@ 
c) VC -> <2st> / ... 
... {<-st:>} ,tORSO _ CONSO,<+voc,-st>,CONSO,# {# } 
(1\]) <2st> -> <lst> / #,#,<4-segm>0 _ 
ri'herelore, before assigni.ng phonenles to graphemes, 
the gral)heme sir:lag has to be changed fin such a way 
as to mark Lhose aff:ixations and compoandings that 
influence th:is ass:ignment and the subsequent phsno- 
log::ca\] operat:ions. \]'he operatioim Jl, this nlodulo are 
also applied by rule, ill the format already dealt 
with. This approach is not obvious from the outset. 
However, the affixes influencing the (:onversion and 
the phonology are li.mited and as such, they can be 
recognized by rule. Furthermore, many compounds have 
\]nterllal grapbeme seqaences which do ilot occur ill 
monomorphemic words. On tile basis of these sequences, 
boundaries can be inserted. Of course, 1;his leaves us 
with compounds which do not have clusters which are 
h,lpermissiblo Jn monomorphenles. An exception \]exicon 
wi\]\] then be necessary. 
Consider the follow:hi; examlIle. In Dutch, a se- 
quence of an obstruenk fo\].\]owed by a voiced ebstruent 
hardly ever occurs Jn monomorphemic words (cf. Zon- 
nevel.d (1983) ). ThJ s charact:er:i.skic is also 
represented in the spelling. Thus, a rule of Om form' 
in (12) separates parks of compounds. 
(12) ¢ -> # / <-son>i <-son,+voice>j,CONSO,VOC 
where i. and j :indicate differeut segments 
After application of rule (12) huisdeur (front door) 
:is represented as tmis#deur. In fact, this rule is 
somewhat more complicated b7 usJn~ coordinations, 
since we also have to exclude eve(generalizations 
such as huis#den (lived).. The graphemes and boundary 
in huis#deur are assigned phonemes by conversion 
rales in tile next modtlle, furthermore the stress 
rules dea/_t with above, assign stress and finall.y the 
phonolog:tcat rule of regressive voicing assimilation 
applies, converting S into Z. The separate stages of 
the derivation of hulsdenr are Mmwn in (13). 
(13) input: It u ~ s d e n r 
boundaries: # # h u i s # d e u r # # 
conversion: # # tI \[JI S # D EUR # it 
sec. stress : # # II "UI S # D "Ell R # # 
prim. stress : # # H 'UI S # D "EU R # # 
phono\]ogy: # # fl 'IlI Z # D "EU R # # 
2. \]\]nplementation 
Now that tile linguist has provided the rules it is 
necessary to consider these from a technical point of 
view. In these linguistic rules one is able to des- 
cribe the phoneme-to-grapheme assignment J.n terms of 
graphemes, phonemes and their context. In principle, 
there are two basic possib:kl:kties to do so. Either 
one refers directly to a basic entity (e.g. grapheme 
or phoneme, in which case the structure has a fixed 
length) or one uses a larger structure that describes 
thee context ill a more complex manner. A basic entity 
can be referred to explicitly, by stating the gra- 
I)heme or phoneme involved, but also implicitly, by 
specifying some features to define a set of phonemes 
or graphemes for which the context is valid. As to 
tile structures, several are availaM.e. The first one 
613 
is alternative validity: one of the specified struc- 
tures must be valid to produce a match (cf. rule 
(2)). The opposite structure is simultaneous 
validity: all of the specified structures must be 
valid to produce a match (cf. (5)). A third possibil- 
ity is negated validity: when the structure is valid, 
no match will result and vice versa (cf. (3)). A 
fourth (commonly used) structure is optional 
validity: the specified structure may or may not be 
present (cf. (12)). 
It will be clear that the system gains consider- 
ably in power by allowing combinations of these 
structures. For the implementation this has some non 
trivial consequences. First of all, a suitable data 
representation must be found to store the linguistic 
knowledge. It is desirable to have this representa- 
tion in a compact and efficient form, as it will be 
consulted 'on-line' during the transcription process. 
Because there are no restrictions on the use of these 
structures, the use of a dynamic data structure to 
represent the knowledge seems appropriate in order to 
prevent too much a waste of memory seems obvious. 
This dynamic data structure consists of a variable 
number of linked units. Each unit consists of a 
number of ~fields' to represent the different types 
of information one needs. This is illustrated below. 
In the field type an indication is given of the type 
of information and its location. Next, there are four 
fields, one of which contains the linguistic informa- 
tion formulated in the rules. If the unit is a gra- 
pheme, the grapheme concerned is stored in tile field 
graph. As a phoneme is seen as a different entity, a 
phoneme will be represented in the next :field, denot- 
ed by phon. Then it is possible to use features to 
indicate a set of phonemes or graphemes. These are of 
two different types but are stored in the same fie:ld 
feat. This actually is a pointer to a list of 
features, containing the value (+ or -) and the 
feature concerned. The final information field con- 
tains a list of other units. This is needed to 
represent the alternative or simultaneous validity. 
Both of these structures are stored in this list, and 
in combination with the first field containing the 
type, the system knows how to interpret this list. 
The optional validity can be seen as an alternative 
validity: either the structure is present or not, and 
this is therefore represented as an alternative vali- 
dity. Finally certain types do not contain any infor- 
mation, but are used as markers, i.e. negation and 
end negation which denote the beginning and end of a 
negated structure. Following these information 
fields, a last field next will refer to a following 
unit, which describes tile next part in the linguistic 
rule. As an example of how this data structure 
represents linguistic rules, the lefthand context of 
rule (8) is shown in (15). Since the focus is the 
starting point, the data structure is constructed 
from right to left. 
(I5) 
An input string will now be transcribed by comparing 
it with the appropriate units. While consulting this 
614 
data structure an unexpected problem arises. Because 
of the freedom the user has to combine different 
structures, it is possible to build a structure which 
has different lengths (number of units) :\[or different 
paths. The unit or structure following this variable 
length structure no longer has a fixed position with 
regard to the starting point. This may especially 
create problems when this first structure is negated 
as well. This can be explained best with an example 
which is hypothetical. The structure, however, could 
easily be used by a linguist and thus the system 
should be able to handle it correctly. Suppose the 
right context of a linguistic rule looks as follows. 
(16) '{a ),t 
- {o,u) 
The rule states that it is not permfltted that either 
an a, or an o fo\].lowed by an u, is present before a 
t. The question is what exactly js meanL with this 
negatfion. A negation is only meaningful within a 
closed set, and therefore the set is defined impli- 
cit\].y by the unit or structure being negated. 'a (not: 
a) means: all graphemes except a. '\[o,u\] means: all 
sequences of two graphemes except the sequence ou. 
The sequence at will therefore belong to this second 
set. If the input string now consists of att, the 
first path will reject the string, but the second 
path will approve of it. As both paths must approve 
of the string to produce a match, this string will be 
rejected. However, it is insufficient only to look at 
the negated part (and then when no match is detected, 
consult the positive part). An input string art would 
then be rejected on account of the leading a, which 
would be incorrect. As there is not t directly foi- 
l.owing the a, the first path can give no verdict on 
the string and should pass it to the second path 
which would approve Jr. It is therefore necessary to 
consider all paths in combination with all following 
units. For further discussion of this particul.ar 
problem, we refer to Van Leeuwen et al. (1986). 
3. Concluding Remarks 
in this paper we have dealt with the system for gra- 
pheme-to-phoneme conversion in Dutch as it is being 
developed at the Universities of Utrecht and Leyden 
and IPO, Eindhoven. We have shown that knowledge 
about spelling and phonology provides a proper gram- 
mar :\[or automatic phoneme-to-grapheme assignment and 
that linguistic rules can be implemented without ad 
hoc mechanisms. Speed was considered an important 
performance feature in constructing the database as 
well as in consulting it. Typical values are: 20 
seconds to (re)construct a new database (for instance 
for testing new rules or new versions of rules), and 
same 25 graphelne-to phoneme canversions per second. 
This phoneme-to-grapheme assignment system has been 
linked to the diphone speech synthesis system that 
has been developed at IPO. At the moment, the system 
is being tested on a lexicon of about 4000 monomor- 
phemie words. 

References 

CHOMSKY, N. AND N. HALLE (1968), The Sound Pattern of 
English. Harper and Row, New York. 

HERTZ, S.R. (1982), From text to speech with SRS. In: 
JASA 74, pp. 1155-1170. 

KLATT, D.II. (1976), Structure of a phonological rule 
component for a synthesis-by-rule program. In: 
IE.EE Transactions on Acoustics, Speech, Signal 
Processing ASSP-24, pp. 291-298. 

LEEUWEN H. VAN, S. LANGEWEG AND E. BERENDSEN (1986), 
'Linguistics as an Input for a Flexible 
Grapheme-to-Phoneme Conversion System in Dutch'. 
In: Proceedings of the IEE Conference on Speech 
Input/Output; techniques and applications, pp. 
200-206. 

WESTER, J. (1984), SF: Contouren van een toegepaste 
fonologie. In: De Nienwe Taalgids 77, pp. 30-43. 

ZONNEVELD, W. (1.983), Lexical and Phonological Pro- ~ 
er ties of Dutch Voicing Assimilation. In : 
. v.d.Broecke, V.v.Heuven, W.Zonneveld (eds.), 
Sound Structures; studies for Antonie Cohen. 
Foris, Dordrecht. 
