GRAFON: A GRAPHEME-TO-PHONEME CONVERSION SYSTEM 
FOR DUTCH 
Walter Daolemans 
AI-I,AB, Vrije Universiteit Brussel 
Pleiulaau 2 K, 1050 Brussels, Belgium 
E-math walterd@ar ti. vub.uucp 
Abstvact 
We (k;~,.c,ibe a set of rt,o(lttle.,i that together tuake up a 
grapheme-to phoneme conversion system for Dutch. Modules 
include a syflabificatiou program, a fast morphological parser, a 
lexical database, a phonological knowledg*: base, transliteration 
rules, and phonological rnles. Knowledge and procedures were 
intlflenmnted object-orientedly. We contrast GRAFON to recent 
p.tern recognitkm and rule. compiler approaches mid tit to show 
that the first fails for languages with concatenative comlmtmding 
(like Dutch, Get,nan, and Scandinavian languages) while the 
second lack.,; the flexibility to model different phonological 
theories. It is claimed that sylhtbles (and not 
graphemes/phonemes or ulorphemes) should be central units in a 
rtde-based phonemisatkm algorithm. Furthermore, the architec- 
tnre of GRA!:"ON and its nser interface make it ideally suited as 
a rule testing tool fol phonologists. 
1. INTROI)~CI'I(1lkl 
Speech :;ynthesis systems cousist of a linguistic and an 
acoustic part The linguistic part converts an orthographic 
representation of a text into a phonetic representation flexible 
and detailed enough to serve as input to the acoustic part. 
The acoustic t)art is a speech synthesiser which may be based 
on the production of allophones or diphones. This paper is 
concerned with file linguistic part of speech synthesis for 
Dutch (a process we will call phonamisation). The problem 
of phonemi~;ation has beett approached in different ways. 
Recently, covnectionist approaches (NETtalk: Sejnowski and 
Rosenberg, 1987) mid memory-based reasoning approaches 
(MBRtalk: Slanfill and Waltz, 1986) have been proposed as 
alternatives tt) the traditional symbol-manipttlation approach. 
Within the latter (rule-based) approach, several systems have 
bexm built for English (the rnost comprehensive of which is 
probably Ml\]'alk; Allen, Hunnicutt and Klatt, 1987), and sys- 
tems for othm" European hmguages are beginning to appear. 
Text-to-,;peech systems tbr Dutch are still in an experi- 
mental stage, and two different designs can be distinguished. 
Some researchers adopt an 'expert system' pattern matching 
approach /Boot, 1984/, others a 'rule compiler' approach 
/Kerkhoff, Wester and Bores, 1984; Berendsen, Langeweg 
attd van Leer,wen, 1986/ in which the rules are mostly in an 
SPE-inspired format. Both approaches take the 
grapheme/phoneme as a central nnit. We will argue that 
within the symbol manipulation approach, a modular architec- 
ture with rite syllable as a central tmit is to be preferred. 
The research described in this paper was supported partly by the 
European Community under ESPRIT project OS 82. The paper is 
based on an hiternal memo (Daelemans, 1985) and on part of a 
dissertation (l)aelemans, 1987b). The system described here is not 
to be eonfilsed with the GRAPHON system developed at the Tech- 
aisehe Oniversitiit Wien (Pounder and Kommcnda, 1986) which is 
a text-to-speech syatenl for German. I am grateful to my forraer 
and present c(lleagtms in Nijmegen and Brtmsels for providing a 
sthnulating wo,'kiag environment. Erik Wybouw developed C-code 
for constructing an indexed-sequential version of the lexical data- 
base. 
The architecture of GRAFON as it is currently imple- 
mented is shown in Figure l. 
MORPHOLOGICAL ANALYSIS I 
J ::::::::::::::::::::: ,~#het##gieren##van~#de##herfst #storm# ~ :,:,:.:.:.: :,>:.:.: 
:i!i2!?!~iiiiiiij2~ii I 
::::::::::::::::::::: I 
:~ ~£EX~:;~r:/: SYLLABI FICATION ::::::::::::::::::::: 
:!::: tTA~A:::::: ##het ~,~gie$ren#~van##de#~,herfst #storm# ~ 
::::::::::::::::::::: 
q>~~':' ~_het ##+gieSren##+van#~de##++herfsU,+storm ~,# 
E 
TRANSLITERATION MAPPINGS 
PHONOLOGICAL RULES 
at 'Xi:ro'van do 'heraf.storam 
Figure I. Architecture o£ GRAFON. Dark boxes indicate knox4edge sources, 
white boxes processing modules, After computing morphological e~d 
syllable boundaries, the system retrieves word accent information and 
applies tre.~sliteration mappings and phonological rules to the input. 
~esultit~g t'epresentanons are s\]aowr~ within the boxes 
An input string of orthographic symbols is first analysed 
morphologically. Then, syllable bonndaries are computed, 
taking into account the morphological boundaries. Morpho- 
logical analysis uses a lexical database, which is also used to 
retrieve word stress of monomorphematic word forms. The 
actual transcription takes the syllable as a basic unit and 
proceeds in two stages: tirst, parts of spelling syllables are 
transliterated into (strings of) phoneme symbols by a number 
of transliteration mappings. To this representation, context- 
sensitive phonological rules are applied, modifying parts of 
the syllables in the process. Any level of phonetic detail 
(between a broad and a narrow transcription) can be obtained 
by adding or blocking rules. 
In the remainder of this paper, we will describe the 
different modules playing a role in GRAFON in some detail, 
go into some language-specific requirements, and discuss the 
advantages of our architectnre to alternative designs. 
133 
2. SYLLABIFICATION 
Information about the position of syllable boundaries in 
spelling input strings is needed for several reasons. The most 
important of these is that most phonological rules in Dutch 
have the syllable as their domain. E.g. Dutch has a schwa- 
insertion rule inserting a schwa-like sound between a liquid 
and a nasal or non-coronal consonant if both consonants 
belong to the same syllable. Compare melk (milk): /mel0k/ 
to melSken (to milk): /melk0/ ($ indicates a syllable boun- 
dary). Without syllable structure this problem can only be 
resolved in an ad hoc way. Furthermore, stress assignment 
rules should be described in terms of syllable structure 
/Berendsen and Don, 1987/. 
Other rules which are often described as having the 
morpheme as their domain (such as devoicing of voiced 
obstruents at morpheme-final position and progressive and 
regressive assimih~tion), shonld really be described as operat- 
ing on the syllable level. E.g. hetSze (/hets0/: smear cam- 
paign; devoicing of voiced fricative at syllable~final position) 
and asSbest (/azbest/: asbestos; regressive assimilation). These 
mono-morphematic words show the effects of the phonologi- 
cal rules at their syllable boundaries. Furthermore, the proper 
target of these rules is not one phoneme, bnt the complete 
coda or onset of the syllable, which may consist of more 
than one phoneme. 
Although these examples show convincingly that syllable 
structure is necessary, they do not prove that it is central. 
However, the following observations seem to suggest the cen- 
trality of the syllable in Dutch phonemisation: 
- The combination of syllable structure and information about 
word stress seems enough to transform all spelling vowels 
correctly into phonemes, including Dutch grapheme <e>, 
which is a traditional stumbling block in Dutch phonemisa- 
lion. Usually, many rules or patterns are needed to transcribe 
this grapheme adequately. 
-All phonological rules traditionally discussed in the litera- 
ture in terms of morpheme structure can be defined straight- 
forwardly in terms of syllable structure without generating 
errols. 
These facts led us to incorporate a level of syllable 
decomposition into the algorithm. This module takes spelling 
strings as input. Automatic syllabification (or hyphenation) is 
a notorionsly thorny problem for Dutch language technology. 
Dutch syllabification is generally guided by a phonological 
maximal onset principle a principle which states that between 
two vowels, as many conso~,ants belong to the second syll- 
able as can be pronounced together. This results in 
syllabifications like groe-nig (greenish), I-na (a name) and 
bad-stof (terry cloth): However, this principle is sometimes 
overruled by a morphological principle. Internal word boun- 
daries (to be found after prefixes, between parts of a com- 
pound and before some suffixes) always coincide with syllable 
boundaries. This contradicts the syllable boundary position 
predicted by the maximal onset principle. E.g. groen-achtig 
(greenish, groe-nachtig expected), in-enten (inoculate, i-nenten 
expected) and stads-tuin (city garden, stad-stuin expected). In 
Dutch (and German and Scandinavian languages), unlike in 
English and French, compounding happens through concate- 
nation of word forms (e.g. compare Dutch spelfout or Ger- 
man Rechtschreibungsfehler to French faute d'orthographe or 
English spelling error). Because of this, the default phono- 
logical principle fails in many cases (we calculated this 
number to be on the average 6% of word forms for Dutch). 
We theretbre need a morphological analysis program to detect 
internal word boundaries. By incorporating a morphological 
parser, the syllabification module of GRAFON is able (in 
principle) to find the correct syllable boundaries in the com- 
plete vocabulary of Dutch (i.e. all existing and all possible 
words). Difficulties remain, however, with foreign words 
and a pathological class of word forms with more than one 
possible syllabification, e.g. balletic may be hyphenated ba/- 
let-je (small ballet) and bal-le-tje (small ball). Syllabification 
in languages with concatenative compounding is discussed in 
\]_34 
more detail in Daelemans (1988, forthcoming). 
3. LEXICAL DATABASE 
We use a word tbrm dictionary instead of a morpheme 
dictionary. At present, some 10,000 citation forms with their 
associated inflected forms (computed algoritlmlically) are 
listed in the lexical database. The entries were collected by 
the university of Nijmegen from different sonrceso The 
choice for a word form lexical database was motivated by the 
following considerations: First, morphological analysis if; 
reduced to dictionary looknp sometimes combined with com- 
pound and affix analysis. Complex word fonaas (i.e. freqneut 
compounds and word tbrms with affixes) ale stored wifl~ their 
internal word boundaries. These boundaries can therefore be 
retrieved instead of computed. Only the structure of complex 
words not yet listed in the dictionary must be computed. 
This makes morphological decomposition computatioually less 
expensive. 
Second, the number of errors in morphological parsing 
owing to overacceptance and nonsense analyses is consider- 
ably reduced. Traditional erroneous analyses of systems 
vsing a morpheme-based lexicon like comput+er and 
under+stand, or for Dutch kwart+el (quainter yard instead of 
quail) and li+epen (plural past tense of lopen, to run; 
analysed as 'epics about the Chinese measure li') are avoided 
this way. Finally, current and forthcoming storage and 
search technology reduce the overhead involved in using large 
lexical databases considerably. 
Notice that the presence of a lexical database suggests a 
simpler solution to the phonemisation problem: we could sim 
ply store the transcription with each entl2¢ (This lexicon-based 
approach is pursued for Dutch by Lammens, 1987). How- 
ever, we need the algoritlun to compute these transcriptions 
antomatically, and to compute transcriptions of (new) words 
not listed in the lexical database. Furthermore, the absence 
of a detailed rule set makes a lexicon-based approach less 
attractive from a linguistic point of view. Also, from a tech 
nological point of view it is a shortcoming that the phonetic 
detail of the transcription can not be varied for different 
applications. 
Our lexical database system can be functionally inter 
preted as existing of two layers: a static storage level in 
which word forms are represented as records with fields 
pointing to other records and fields containing various kinds 
of information, and a dynamic knowledge level in which 
word forms are instances of linguistic objects grouped in 
inheritance hierarchies, and have available to them (through 
inheritance) various kinds of linguistic knowledge and 
processes. This way new entries and new information associ- 
ated with existing entries can be dynamically created, and 
(after checking by the user) stored in the lexical database. 
This lexical database architecture is described in more detail 
in Daelemans (1987a). 
4. MORPHOLOGICAL ANALYSIS 
Morphological analysis consists of two stages: segmenta- 
tion and parsing. The segmentation routine finds possible 
ways in which the input string can be partitioned into diction- 
ary entries (working from right to left). In the present appli- 
cation, segmentation stops with the 'longest' solution. Con- 
tinuing to look for analyses with smaller dictionary entries 
leads to a considerable loss in processing efficiency and an 
increased risk at nonsense-analyses. The loss in accuracy is 
minimal (recall that the internal structure of word forms 
listed in the lexical database can be retrieved). 
Some features were incorporated to constrain the number 
of dictionary lookups necessaly: the most efficient of these 
are a phonotactic check (strings which do not conform to the 
morpheme st,'uctnre conditions of Dutch are not lool~ed up), 
and a speciai memory buffer (snbstrings already looked up 
are cached with the result of their lookup; during segmenta- 
tion, the sable substrings arc often looked up more than 
once). 
'l'h(~ par:Sng part of morphological analysis uses a com- 
pound grammar and a chat1 parser formalism to accept or 
reject combinations of dictionary entries. It works from left 
to right, It also takes into account spelling changes which 
may occur at the boundary of two pa~s of a compound 
(these are called linking graphemes, e.g. hemelSblauw; sky- 
blue, eiERdooier; egg-yolk). 
During dictionaryqookup, word stress is retrieved for 
the dictionar/ entries (tiffs part of the process could be 
replaced by additional rides, but as word stress was awfil- 
able in the Icxieal database, we only had to define the rules 
for stress ass,gument in new compouuds). 
5. \]?I\]tOl'qOl ,OG~CAL KNOWLEDGE 
Knowledge about Dutch phonemes is implemented by 
nleans of a tT/pe hiel'archy, by inheritance and by associating 
features to objects, in a standard object-oriented way. Infor- 
mation about a particular phonological object can be available 
through feature inheritance, by computing a method or by 
returning the stored value of a feature. However, the exact 
way informai:iou fl'om the phonological knowledge base is 
retrieved, is hidden fi'om the user. An independent interface 
to the knowl,xlgc base is defined consisting of simple LISP- 
like predicates and (transformation) fhnctions in a uniform 
format, bJ..g. (obstruent? x), (syllabic? x), (make-voiced x) 
etc. The arl~:wer call b~3 true, false, a numerical value when 
a gradation is used a special message (undefined), or in the 
case of tran:~tbrmation fnnetions, a phoneme or string of 
phonemes. 'fhese functions and predicates, combined with 
Boolean OlYerators AND, OR and NOT are used to write the 
conditions and actions of the phonological rules. The inter- 
face allows us to model different theoretical formalisms using 
rite same knowledge base. E.g. the generative phonology for- 
malism can lye modelled at the level of the interface flmc- 
tions. 
The morphological aualysis and syllabification stages in 
the algorithm output a list of syllables in which internal and 
external word boundaries ,and word stress are marked. Each 
syllable be.comes an instance of the object type syllable, 
which has a set of features associated with it: Figure 2 lists 
these feature~, and their value for one particular syllable. 
~#het~'#++gie$ren-~#~+van##de#~'++herfst#+storm 
<syl \[5467> 
spoiling herfst 
closed? true 
stressed? 1 
previous-syllable <syl 5466> 
next-syll~ble <syl 5468> 
ext ernt~l-word-boundary? false 
internel-werd-boundary? true 
structure h e rrst 
onset /h/ 
nucleus Isl 
coda Irafl 
transcription lherefl 
Figure 2. Examt,le instance of the object type SYLLABLE and its associated 
feature values after transcription. 
The vahye of some of these features is set by means of' infor- 
mation in the input: spelling, closed? (true if tile syllable ends 
in a consonant), stressed? (1 if the syllable carries primary 
stress, 2 if it carries secondary stress), previous-syllable and 
next-syllable (pointers to the neighbouring syllables), 
external-word-boundary? (true if' an external word boundary 
follows), internal-word-boundary? (true if an internal word 
boundary follows). The wdues of' these features are nscd by 
the transliteration and phonological rules. Of other features, 
the value must be computed: structure is computed on the 
basis of the spelling feature. The value of this feature reflects 
the internal structure of the spelling syllable in terms of' 
onset, nucleus and coda. Tile features onset, nucleus and 
coda (this time referring to the phonological syllabl~) are 
computed by means of the transliteration and phonolopical 
rules. Their initial values are the spelling, their final vah~es 
are file transcription. The rules have access to the wdue of 
these features and may change it. The feature ~r~msc~'iF&~, 
stands lbr the concatenation of ttle final or intermediate values 
of onset, mtcleus and coda. 
Transliteration rules are mappings from elements of syll. 
able structure to their phonological counterpart. E.g, fi~e sytl- 
able onset <sch> is mapped to /sX/, nucleus <:ie> to /i/, 
and coda <x> to /ks/. Conditions can be added to make the 
mapping context-sensitive: onset <c> is mapped to /s/ if a 
front vowel follows, and to /k/ if a back w)wel folk)ws. 
There are about forty transliteration mappings. 
Tbe phonological rnles apply to the output of tbe tra,sli- 
teration mappings (which may be regarded as some kind of 
twoad transcription). They are sequentially ordered. Each 
rule is an instance of the object type phonologic~d-ruh'., which 
has six features: active-p, domain, application, conditi~nJs, 
actions and examples. A rtde cau tye made active or inactive 
depending on the value of active-p. If it is true, sendiny an 
application message to the rule results in checking the co~di. 
dons on a part of the input string constrained by domai~ 
(which at present can be syllable, morpheme, word or sen 
tence). If the conditions return trne, the actions; expression i,; 
executed. Actions may also involve the triggering of other 
rules. E.g. shwa-insevtion triggers re-syllabification. Coudi 
lions and actions are written in. a l~mguage consisting of the 
phonological functions and predicates mentioned earlier (they 
access the phonological knowledge base and fi:atures of syll- 
ables), Boolean connectors, and simple string-manipulation 
functions (first, last etc.). After successful application of a 
rule, the input string to which it was applied is stored in the 
examples featm'e. This way, interesting data about the opera 
lion of the rule is available from the rnle itself'. In Figtlrc 3 
some examples of rules are shown, l)ill'erent no~ations for 
this rule are possible, e.g. the similarity betwet~u both rul~-s 
could be exploited to merge them into one rule. 
6. RELATED RESEARCH 
In the pattern recognition approach advocated by Martin 
Boot (1984), it is argued that affix-stripping rule~; (without 
using a dictionary) and a set of context-sensitive pattern 
matching rules suffice to phonemise spelling input. Bc~ot ,;tates 
that 'there is no linguistic motivation for a phouenli,;atio, 
model in which syllabification plays a significant role'. We 
REGRESSIVE ASSIMILATION 
Active7 True 
Domain? Syllable 
Conditions 
(let ((coda-1 (lost (coda SYL))) 
(onset-Z (r;rst (onset (next SYL))))) (and 
(stop? onset-2) (voiced? onset-2) 
(obstruent ? code- l ) 
(nO~ (voicod? coda-l)))) 
Actions 
(meke=voiced (coda SYL)) 
PROGRESSIVE ASSIMILATION 
Active? True 
Domain? Syllable 
Conditions 
(let ((coda-I (lost (codaSYL))) 
(onset-2 (first (onset (next SYL))))) 
(~d 
(obstruent? coda- \[ ) 
(not (voiced ? coda- 1 )) 
(fricative? onset-2) (voiced? 
onset=2))) 
Actions 
(mske-voiceloss (onset (next SYL))) 
Figure 3. A possible detinition of voice assimilation rules in Dutch. The 
LET syntax is used for local variable binding, but is not stTietly needed. 
SYL is bound to the curt'cut syllable. 
In a rule compiler approach (e.g. Kerkhoff, Wester and 
Boves, 1984; Berendsen, Langeweg and van Leeuwen, 1986), 
rules in a particular format (most often generative phonology) 
are compiled into a program, thereby making a strict distinc- 
tion between the linguistic and computational parts of the sys- 
tem. None of the existing systems incorporates a full mor- 
phological analysis. The importance of morphological boun- 
daries is acknowledged, but actual analysis is restricted to a 
number of (overgenerating) pattern matching rules. Another 
serious disadvantage is that the user (the linguist) is restricted 
in a compiler approach to the particular formalism the com- 
piler knows. I would be impossible, for instance, to incor- 
porate theoretical insights from autosegmental and metrical 
phonology in a straightforward way into existing prototypes. 
In GRAFON, on the other hand, the phonological knowledge 
base can be easily extended with new objects and relations 
between objects, and even at the level of the fimction and 
predicate interface, some theoretical modelling can be done. 
This flexibility is paid, however, by higher demands on the 
linguist working with the system, as he should be able to 
write rules in a LISP-like applicative language. However, we 
hope to have shown fl'om examples of rules in Figure 3 that 
the complexity is not insurmountable. 
7. APPLICATIONS 
Apart from its evident role as the linguistic part in a 
texture-speech system, GRAFON has also been used in other 
applications. 
7.1. Linguistic Tool 
One advantage of computer models of linguistic 
phenomena is the framework they present for developing, 
testing and evaluating linguistic theories. To be used as a 
linguistic tool, a natural language processing system should at 
least come up to the following requirements: easy 
modification of rules should be possible, and traces of rule 
application should be made visible. 
In GRAFON, rules can be easily modified both at the 
macro level (reordering, removing and adding rules) and the 
micro level (reordering, removing and adding conditions and 
actions). The scope (domain) of a rule can be varied as 
well. Possible domains at present are the syllable, the mor- 
pheme, the word and the sentence. Furthermore, the applica- 
tion of various rules to an input string is automatically traced 
and this derivation can be made visible. For each phonologi- 
136 
cal rule, GRAFON keeps a list of all input strings to which 
the rule applies. This is advantageous when complex ~'ale 
interactions must be studied. Figure 4 shows the user inter 
face with some output by the program. Apm~t from the 
changing of rules, the derivation, and the example list for 
each different rule, the system also offers menu-based facili- 
ties for manipulating various parameters used in the hyphena- 
tion, parsing and conversion algorithms, and for compiling 
and showing statistical information on the distribution of allo. 
phones and diphones in a corpus. 
7.2. Dietionm2¢ Construction 
Output of GRAFON was used (after manual checking) 
by a Dutch lexicographic firm for tile construction of the 
pronunciation representation of Dutch entries in a Dutch 
French translation dictionary. The program tarried out to I~ 
easily adaptable to the requirements by blocking rules which 
would lead to too much phonetic detail, and by changing the 
domain of others (e.g. the. scope of assimilation rules was 
restricted to internal word boundaries). The accuracy of' the 
program on the 100,000 word corpus was more than 99%, 
disregarding loan words. The phonemisation system also 
plays a central role in die dynamical part of the lexical 
database architecture we have described elsewhere /Daele- 
marts, 1987a/. 
7.3. Spelling Error Correction 
A spelling error correction algorithm based on the idea 
that people write what they heat' if they do not know the 
spelling of a word has teen developed by Van Berkel /Van 
Berkel and De Smedt, 1988/. A dictionary is used in which 
the word forms have been transformed into phoneme 
representations with a simplified and adapted version of 
GRAFON. A possible error is transformed with the same 
algorithm and matched to the dictionary entries. Combined 
with a trigram (or rather triphone) method, this system can 
correct both spelling and typing errors at a reasonable speed. 
8. IMPLEMENTATION AND ACCURACY 
GRAFON was written in ZetaLisp and Flavors and runs 
on a Symbolics Lisp Machine. The lexical database is stored 
on a SUN Workstation and organised indexed-sequentially. 
Accuracy measures (on randomly chosen Dutch text) are 
encouraging: in a recent test on a I000 word text, 99.26% of 
phonemes and 97.62% of transcribed word tokens generated 
by GRAFON were judged correct by an independent linguist. 
The main source of errors by the program was the presence 
of foreign words in the text (mostly of English and French 
origin). Only a marginal number of errors was caused by 
morphological analysis, syllabification or phonological rule 
application. 
There is at present one serious restriction on the system: 
no syntactic analysis is available and therefore, no sophisti- 
cated intonation patterns and sentence accent can be corn- 
puted. Moreover, it is impossible to experiment with the Phi 
(the phonological phrase, which may restrict sandhi processes 
in Dutch) as a domain for phonological rules. However, 
recently a theory has been put forward by Kager and Qnen6 
(1987) in which it is claimed that sentence accent, Phi bonn- 
daries and I (intonational phrase) boundaries can be computeA 
without exhaustive syntactic analysis. The information needed 
is restricted to the difference between function and content 
words, the category of fimction words, and the difference 
between verbs and other content words. All this information 
is accessible in the current implementation of GRAFON 
through dictionary-lookup. 
Gt a fen 
tlvpher~atlon 0Ol, lons f'honemlse Input, Phonemlse File 
l'a, ~e," (lotions I lyphenare lrlpul, 6rafon Loop 
I~amples Options Parse Illptl~ 5holv Rules 
Uandah+ 3ob 
Ill ~lforl eannantl: f}t'oftm Loop 
~> lint 9ieren van tle herfsl;~tarn 
(El =: '' \[lIE nell == *+ Villi == lie == +* IIER|:SI = +* SIORH) 
;*I 'KL:Ia 'V¢,111 dO +NEFOF+sLi)I'~III Give Oxalrl~ ~1o~1 01: I! t 
---'If,IF f I/it- t}\[gOli\]Ri;--- + t P~ {'9' esslv+ R~slnElat ~on) 
!) l 1! 
(~ 6'ot,el Ililtongisation 1) 
t t'l~ 
fJ It I,l~letfon) 
ht!t ft~t 
(';. CIt,st+, F?oduction) 
<~ I)~9~lafnation) 
'i 141 I" rl 
, :" Schla rnsertlo,O 
ill af,),/ coranatl(l: Show l#ule~ 
l)r .3for) col~r1+~n(}: Ea{~DII)Z~5 l\]f)t,l~r~s 
FxaI'q) l(~ for rule ~<SClIHfI-INSEItIION 2110116! 
+ell in (," E CttO = IT, E * IJE\[F) 
la,n +n ~C gO g '1 
ht¢lp it+ (, E£R 51E = *" ttZILP = gER * Z\[ IIItlG) 
FINN -OEVOICIN6 Pl OS/VE- 
fO FRith riVE R-+BEI El \[fIN 
P/~Lhl,qL ISh 110N 
IllhlU541tklR~ 
IN\[\[~UOC~I\]U UOi\[IW, UfJIVEL 
O WII IlIONGI5tl f ItJN- l 
gOt~f\[ -OlPlllllON61ShI ION 2 
PRI)GRESSlVE-FISSINII ~FI\[)N RfGI~ES5 IUf .-hSS IHIt.,q 1 \]\[\]N 
NhSN.-hfSItllt ~t 10N fLUS/FR-PEIJIJf \[ 10N 
\[It OEt4II,ih l l ON 
\[}erivation 
Dialects Exit 
Statistics 
#h_otm.\[_og\[£al Ppl_e.~ _. ~e_~ n 
'POg, l~ES~tqE flSSnllLflflflll \[\] t~ 
?EeREsSI~OE rlSSlml.fll 1011 \[\] E 
till I Ink I\]E/lOl C l liE, \[7~ 
~l OSIU\[~-I O "FRt Cfll IUE 1~ 
'1118111 fl~iSlflllfllllal ........ 1~ ~, 
)\[ flElll IIR\[ \[ fJIl \[\] t~ 
:llffl\[ l)\[:tdfJ\[Cllffl \[\] \[~ 
I "I)EI.EI IUIt ~ E 
'111 fllfll Isni loll \[_\] 
iiRius FILI. IIIG 1~ 1~ 
3CIIIIll- IflSEP, I IOH \[\] \[~ 
!1USIEI4 I#EIIIICl \[llll \[\] 
\[ lIT \[{Pl)OI;RI. 1 e COl CI II{~ 1~ \[ 
Initialise 
IHIIEI " 11 \[ PlI/llOllG I Slql I {Ill I El ,'9,.++? tnrygto,!uLsm m+! 2 _ ~+ E 
Do It \[\] fiber t \[2\] 
UJ 
0 
e'-'~jm),: 5 J" P b t q k 9 :),~, t ul,;0o,~ . 
,,,111 ,,, IIJ, llll 
s z \[ vx ~,,,t,OI J wh a"e~'kl e i acra e I o o II y a 
Figure 4. Snapshot of the user interface to GRAFON. Top lelt, the system has 
computed an internal representation and transcription of a seuteuce 
fragment. A derivation is also printed. In the centre of the display, a menu 
listing all phonological rules is shown. By clicking on a rule, the user 
gets a list of input phrases to which the rule has been applied (middle 
left). The same list of rules is also given in the top right mmm. This 
time, the application of individual rules can be blocked, and the result of 
this can be studied. The chart bottom right shows the fi'equency distribution 
of phonemes for the current session, 
9. CONCLUSIONS 
\]{t seems that high quality phonemisation for Dutch can 
be achieved otdy by incorporating enough linguistic 
knowledge (about syllable boundaries, internal word boun- 
daries etc.). GRAFON is a first step in this direction. 
Although it lacks some sources of knowledge (notably about 
sentence accent and syntactic structure), a transcription of 
high quality and accuracy can already be obtained, and the 
system was successfully applied in practical tasks like rule 
testing, dictionary constrnction and spelling error correction. 
At present, we are working on the integration of a syn- 
tactic parser into GRAFON. This would make available the 
phonological phrase as a domaitt, and would make the com- 
putation of natural intonation patterns possible (vsing e.g. the 
algoritlun developed in Van Wijk and Kempen, 1987). The 
alternatiw~ approach to the comptttation of phonological 
phrase boundaries /Kager and Quen6, 1987/ is also being 
explored. 
Another (more trivial) extension is the addition of 
preprocessors for the verbal expansion of abbreviations and 
numbers, 'rite specifications of a lexical attalyser providing 
this functionality were provided in Daelemans (1987b). An 
overview of the system including the modules we are 
presently working on is given in Figure 5, 
137 
LEXICAL ANALYSIS ) 
SYNTACTIC ANALYSIS --1 
((her gieren)(van de herfst~torm)) 
_ 
MORPHOLOGICAL ANALYSIS 
~#het##gieren##lvan~#de##herfst#st°rm## 
SYLLABIFICATION 1 
##het~*~gieSren## I ven##de~*#herfst#storrn ~*# 
t  oRD STRESS ASSIGNMENT ..\] #~'het ##+gie$ren##1+van##de~*#+÷herf st#+st°rm J 
TRANSLITERATION MAPPINGS I 
PHONOLOGICAL RULES 
at 'Xi:ro'vo, n de 'hsraf,storem 
I. 
SENTENCE ACCENT ASSIGNMENT J 
INTONATION CONTOUR COMPUTATION 
at 'X i~r a'v&n do 'h~ref,storam 
Figure 5. Processing modules in an extended version of GRAFON. 

REFERENCES 

Allen, J., M.S. Hnnnicutt and D. Khttt. From 7~xt to 
Speech. Cambridge, UK: C.U.P., 1987. 

Berendsen, E., S. \[,angeweg and H. van Leeuwen. 'Compu- 
tational Phonology: Merged, not Mixed.' Proceedings of 
COLING-86, 1986. 

Berendsen, E. and J. Don. 'Morphology attd stress in a 
rule based grapheme-to-phonente conversion system for 
Dutch.' Proceedings European Conference on Speech 
Technology, Edinburgh 1987. 

Berkel, B. Van and K. De Smedt. 'Triphone attalysis: A 
Combined Method for the Correction of Orthographical 
and Typographical Errors.' Proceedings 2nd ACt, 
Applied Conference, 1988. 

Boot. M., ?Iml, tekst, computer. Katwijk: Servire, 1984. 

Daelemans, W. 'GRAFON: An Object-oriented System for 
Automatic Grapheme to Phoneme Transliteration and 
Phonological Rule Testing.' Memo, University of 
Nijmegen, 1985. 

Daelemans, W. 'A Tool for the Automatic Creation, Exten- 
sion and Updating of Lexical Knowledge Bases.' 
Proceedings of the Third ACL European Chapter 
Conference, 1987a. 

Daelemans, W. Studies in Language J'e&nology: An Object- 
Oriented Computer Model of Morpho-phonological 
Aspects of Dutch. Doctoral Dissertation, University of 
Leuven, 1987b. 

Daelemans, W. 'Automatic Hyphenation: Linguistics w~'rsus 
Engineering.' In: F. Stems and F.J. Heyvaert (Eds.), 
Worlds behind Words, forthcoming 1988. 

Kager, R. and H. Quen6. 'Deriving prosodic sentence struc- 
ture without exhaustive syntactic analysis.' Proceedings 
European Conference on Speech Technology, Edinburgh 
1987. 

Kerkhoff, J., J. Wester and L. Boves, 'A compiler for imple- 
menting the linguistic phase of a text-to-speech conver- 
sion system'. In: Bennis and Van Lessen Kloeke (eds), 
Lh~guistics in the Netherlands, p. 111-117, 1984. 

Lammens, J.M.G. 'A Lexicon-based Grapheme-to-phoneme 
Conversion System.' Proceedings European Conference 
on Speech Technology, Edinburgh 1987. 

Pounder, A. and M. Kommenda. 'Morphological Analysis 
for a German Text-to-speech system.' COLING '86, 
1986. 

Sejnowski, TA. and C.R. Rosenberg. 'Parallel Networks that 
Learn to Pronounce English Text.' Complex Systems 1, 
1987, 145-168. 

Stanfill, C. and D. Waltz. 'Toward Memory-based Reason- 
ing.' Communications of the ACM, 29 (12), 1986, 
1213-1228. 

Wijk, C. van and G. Kempen, 'From sentence structure to 
intonation contour'. In: B. Muller (Ed.), Sprachsyn- 
these. Hidesheim: Georg Olms Verlag, 1985, p. 157-182. 
