/ 
/ 
i 
/ 
II 
/ 
II 
l 
II 
/ 
/ 
Modularity in Inductively-Learned Word Pronunciation Systems * 
Antal van den Bosch 1, Ton Weijters 2, Walter Daelemans 1 
1 ILK / Computational Linguistics 
Tilburg University 
P.O. Box 90153 
NL-5000 LE Tilburg 
The Netherlands 
{ant alb, walt er}@kub, nl 
2 Department of Information Technology 
Eindhoven University of Technology 
P.O. Box 513 
NL-5600 MB Eindhoven 
The Netherlands 
A. J.M.M. Weijt ers@tm, tue.nl 
Abstract 
In leading morpho-phonological theories and 
state-of-the-art text-to-speech systems it is 
assumed that word pronunciation cannot be 
learned or performed without in-between anal- 
yses at several abstraction levels (e.g., mor- 
phological, graphemic, phonemic, syllabic, and 
stress levels). We challenge this assump- 
tion for the case of English word pronunci- 
ation. Using IGTR~B, an inductive-learning 
decision-tree algorithms, we train and test 
three word-pronunciation systems in which the 
number of abstraction levels (implemented as 
sequenced modules) is reduced from five, via 
three, to one. The latter system, classifying 
letter strings directly as mapping to phonemes 
with stress markers, yields signitlcemtly better 
generali~tion accuracies than the two multi- 
module systems. Analyses of empirical results 
indicate that positive utility etfects of sequenc- 
ing modules are outweighed by cascading er- 
rors passed on between modules. 
1 Introduction 
Learning word pronunciation can be a hard task 
when the relation between the spelling of a language 
and its corresponding pronunciation is many-to- 
many. The English writing system and its pronunci- 
ation are a notoriously complex example, mused by 
an apparent conflict between analog~/and inconsis- 
~enc~/: 
Analogy. When two words or word chunks have a 
similar spelling, they tend to have a slmil~r pro- 
nunciation. This tendency (which generalises to 
other language tasks as well) is usually referred 
to as the analogy principle(De Saussure, 1916; 
Yvon, 1996; Daelemans, 1996). 
*This research was partially performed by the first 
and second author at the Department of Computer Sci- 
ence of the Universiteit Manstricht (The Netherlands), 
and partially in the context of the "Induction of Lin- 
guistic Knowledge" research programme, partially sup- 
ported by the Foundation for Language Speech and Logic 
(TSL), funded by the Netherlands Organization for Sci- 
entific Research (NWO). 
Inconsistency. Much of the analogy in English 
word pronunciation is disrupted by productive 
and complex word morphology, word stress, and 
gmphematics. 
Influential pre-Chomskyan \]ingu~tic theories have 
been pointing at the analogy principle as the under- 
lying principle for language learning (De Sanssure, 
1916), and at induction as the reasoning method 
for generalising from learned instances of language 
tasks to new instances through analogy (Bloomfield, 
1933). However, methods and resources (e.g., com- 
puter technology) were not available then to demon- 
strate how induction through analogy could be em- 
ployed to learn and model language tasks. Partly 
due to this lack of demonstrating power, Chomsky 
later stated 
"... I don't see any way of explaining the 
resulting final state \[of language learning\] 
in terms of any proposed general devel- 
opmental mecha, i~_m that has been sug- 
gested by artificial intelligence, sensorimo- 
tot mechanisms, or anything else" (Chore- 
sky, in (Piatelll-Palmadni, 1980), p. 100). 
Chomsky's argument is based on the assump- 
tion that generic learning methods such as induc- 
tion cannot discover autonomously essential levels 
of abstraction in language processing tasks. Ap- 
pl;ed to morpho-phonology, the argument states that 
generic learning methods are not able to discover 
morphology, graphematies, and stress patterns au- 
tonomonsly when learning word pronunciation, al- 
though this knowledge appears essential. Phonologi- 
cal and morphological theories, influenced by Chom- 
skyan theory across the board since the publica- 
tion of spy. (Chomsky and Halle, 1968), have gen- 
erally adopted the idea of abstraction levels in var- 
ious guises (e.g., levels, tapes, tiers, grids) (Gold- 
smith, 1976; Liberman and Prince, 1977; Kosken- 
niemi, 1984; Mohanan, 1986). Although there is no 
general consensus on which levels of abstraction can 
be discerned in phonology and morphology, there is 
a rough, global agreement on the fact that words 
can be represented on different abstraction levels as 
van den Bosch, Weiflers and Daelemans 185 Modul,~rity in Word Pronunciation systems 
Antal van den Bosch, Ton Weijters and Walter Daelemans (1998) Modularity in Inductively-Learned Word Pronunciation 
Systems. In D.M.W. Powers (ed.) NeMLaP3/CoNLL98: New Methods in Language Processing and Computational Natural Language Learning, ACL, pp 185-194. 
strings ofletters, graphemes, morphemes, phonemes, 
syllables, and stress patterns. 
According to these leading morpho-phonological 
theories, systems that (learn to) convert spelled 
words to phonemic words in one pass, i.e., without 
making use of abstraction levels, axe assumed to be 
unable to generalise to new cases: going through 
the relevant abstraction levels is deemed essential to 
yield correct conversions of previously unseen words. 
This assumption implies that if one wants to build 
a system that converts text to speech, one should 
implement explicitly the relevant levels of abstrac- 
tion. Such explicit implementations of abstraction 
levels can indeed be witnessed in many state-of-the- 
art speech synthesisers, implemented as (sequential) 
modules (Allen, Hunnicutt, and Klatt, 1987; Daele- 
mans, 1988). 
In this paper we challenge the assumption that 
levels of abstraction must be made explicit in learn- 
ing and performing the word-pronunciation task. 
We do this by applying an inductive-learning al- 
gorithm from machine learning to word pronunci- 
ation. From a wealth of existing algorithms in ma- 
chine learning (Mitchell, 1997), we choose IGTlt~B 
(Daelemans, Van den Bosch, and Weijters, 1997), an 
inductive-learning decision-tree learning algorithm. 
IGTR~.E is a fast algorithm which has been demon- 
strated to be applicable to language tasks (Van 
den Bosch and Daelemans, 1993; Van den Bosch, 
Daclemans, and Weijters, 1996; Daelemans, Van den 
Bosch, and Weijters, 1997). We construct IGTRE~ 
decision trees for word pronunciation, and perform 
empirical tests to estimate the trees' generalisation 
accuracy, i.e., their ability to process new, unseen 
word-pronunciation instances correctly. 
Rather than constructing and testing a single sys- 
tem, our approach is to test dflferent moduiari- 
sations of the word-pronunciation task systemati- 
cally, to allow for an empirical comparison of word- 
pronunciation systems with and without the explicit 
learning of abstraction levels. First, we train (by 
inductive learning) and test a word-pronunciation 
model reflecting linguistic assumptions on abstrac- 
tion levels quite closely: the model is composed of 
five sequentially-coupled modules. Second, we train 
and test a model in which the number of modules 
is reduced to three, integrating two pairs of levels 
of abstraction. Third, we train and test a model 
performing word pronunciation in a single pass, i.e., 
without modular decomposition. 
The paper is structured as follows: first, in Sec- 
tion 2 we provide a description of IGTREE, the data 
on which the IGTRI~B is trained and tested, and the 
applied experimental methodology. Second, in Sec- 
tion 3 we introduce the three word-pronunciation 
systems, and for each system we describe the exper- 
iments performed and discuss the results obtained. 
In Section 4 we compare the three systems and anal- 
yse the consequences of modularisation. Section 5 
briefly mentions related work on inductive learning 
of word pronunciation. Section 6 summarises the 
results obtained and lists some points of discussion. 
2 Algorithm, Data, Methodology 
2.1 Algorithm: IGTREE 
IGTR~.E (Daelemans, Van den Bosch, and Weij- 
ters, 1997) is a top-down induction of decision trees 
(TDIDT) algorithm (Breiman et al., 1984; Quinlan, 
1993). TDIDT is a widely-used method in super- 
vised machine learning (Mitchell, 1997). IGTREE 
is designed as an optlmi~ed approximation of the 
instance-based learning algorithm IBI-IQ (Daele- 
mans and Van den Bosch, 1992; Dademans, Van 
den Bosch, and Weijters, 1997). In I6TR~E, infor- 
mation gain is used as a guiding function to com- 
press a data base of instances of a certain task into 
a decision tree 1. Instances are stored in the tree as 
paths of connected nodes ending in leaves which con- 
tain classification information. Nodes are connected 
via arcs denoting feature values. Information gain 
is used in IGTREE to determine the order in which 
feature values are added as arcs to the tree. Informa- 
tion gain is a function from information theory, and 
is used similarly in ID3 (Qululan, 1986) and c4.5 
(Qnlnlan, 1993). 
The idea behind computing the information gain 
of features is to interpret the training set (i.e., the 
set of task instances for which all classifications ave 
given and which are used for training the learning 
algorithm) as an information source capable of gen- 
erating a number of messages (i.e., classifications) 
with a certain probability. The information entropy 
H of such an information source can be compared 
in turn for each of the features characterlsing the 
instances (let n equal the number of features), to 
the average information entropy of the information 
source when the value of those features axe known. 
Data-base information entropy H(D) is equal to the 
number of bits of information needed to know the 
classification given an instance. It is computed by 
equation 1, where p~ (the probability of classifica- 
tion i) is estimated by its relative frequency in the 
training set. 
= - pjog2p, (I) 
i 
To determine the information gain of each of the n 
features fx... fn, we compute the average informa- 
tion entropy for each feature and subtract it from 
the information entropy of the data base. To com- 
pute the information entropy for a feature fl, given 
in equation 2, we take the weighted average informa- 
tion entropy of the data base restricted to each pos- 
sible value for the feature. The expression DLf~=~ \] 
X IGTB.BE can function with any feature weighting 
method, such as gain ratio (QuinIaa, 1993); for all ex- 
periments reported here, information gain was used. 
van den Bosch, Weijters and Daelemans 186 Modularity in Word Pronunciation systems 
I 
I 
I 
I 
I 
I 
l 
| 
I 
I 
I 
/ 
| 
/ 
/ 
/ 
refers to those patterns in the data base that have 
value vj for feature f~, j is the number of possible 
values of f~, and V is the set of possible values for 
feature /~. Finally, \]DI is the number of patterns in 
the (sub) data base. 
'v,/EV 
Information gain of feature fi is then obtained by 
equation 3. 
G(y,) = IZ(D) - H(Z~t~,\]) (3) 
In IGTREE, feature-value information is stored in the 
decision tree on arcs. The first feature values, stored 
as arcs connected to the tree's top node, axe those 
representing the values of the feature with the high- 
est information gain, followed at the second level of 
the tree by the values of the feature with the second- 
highest information gain, etc., until the classifica- 
tion information represented by a path is unambigu- 
ous. Knowing the value of the most important fea- 
ture may already uniquely identify a classification, in 
which case the other feature values of that instance 
need not be stored in the tree. Alternatively, it may 
be necessary for disambiguation to store s long path 
in the tree. 
Apart from storing uniquely identified class labels 
at leafs, IGTREE stores at each non-terminal node in- 
formation on the most probable classification given 
the path so far. The most probable classification is 
the most frequently occurring classification in the 
subset of instances being compressed in the path 
being expanded. Storing the most probable class 
at non-terminal nodes is essential when processing 
new instances. Processing a new instance involves 
traversing the tree by matching the feature values of 
the test instance with arcs the tree, in the order of 
the feature information gain. Traversal ends when 
(i) a leaf is reached or when (fi) matching a feature 
value with an arc fails. In case (i), the classification 
stored at the leaf is taken as output. In case (ii), 
we use the most probable classification on the last 
non-terminal node most recently visited instead. 
2.2 Data Acquisition and Preprocessing 
The resource of word-pronunciation instances used 
in our experiments is the CELEX lexical data base 
of English (Burnage, 1990). All items in the cgLv.x 
data bases contain hyphenated spelling, syllabified 
and stressed phonemic transcriptions, and detailed 
morphological analyses. We extracted from the En- 
giish data base of CZLZX all the above information, 
resulting in a data base containing 77,565 unique 
items (word forms with syllabified, stressed pronun- 
ciations and morphdogical segmentations). 
For use in experiments with learning algorithms, 
the data is preprocessed to derive fixed-size in- 
stances. In the experiments reported in this paper 
van den Bosch, Weijters and Daelemans 187 
different morpho-phonological (sub)tasks are inves- 
tigated; for each (sub)task, an instance base (train- 
ing set) is constructed containing instances produced 
by windowing (Sejnowski and Rosenbezg, 1987) and 
attaching to each instance the classification appro- 
priate for the (sub)task under investigation. Table 1 
displays example instances derived from the sample 
word booking. With this method, for each (sub) task 
an instance base of 675,745 instances is built. 
In the table, six classification fields axe shown, one 
of which is a composite field; each field refers to one 
of the (sub)tasks investigated here. M stands for 
morphological decomposition: determine whether a 
letter is the initial letter of a morpheme (class '1') 
or not (class 'O'). x is graphemic parsing2: deter- 
mine whether a letter is the first or only letter of a 
grapheme (class '1') or not (class '0'); a grapheme is 
a cluster of one or more letters mapping to a single 
phoneme. G is grapheme-phoneme conversion: de- 
termine the phonemic mapping of the middle letter. 
y is syllabification: determine whether the middle 
phoneme is syllable-initial, s is stress assignment: 
determine the stress level of the middle phoneme. 
Finally, GS is integrated grapheme-phoneme conver- 
sion and stress assignment. The example instances 
in Table 1 show that each (sub)task is phrased as a 
classification task on the basis of windows of letters 
or phonemes (the stress assignment task s is inves- 
tigated with both letters and phonemes as input). 
Each window represents a snapshot of a part of a 
word or phonemic transcription, and is labelled by 
the classification associated with the middle letter of 
the window. For example, the first letter-window in- 
stance __book is linked with label '1' for the morpho- 
logical segmentation task (M), since the middle letter 
b is the first letter of the morpheme book;, the other 
instance labelled with morphological-segmentation 
class '1 ~ is the instance with i in the middle, since 
i is the first letter of the (inflectional) morpheme 
ing. Classifications may either be binary ('1' or 
'0') for the segmentation tasks (M, A, and y), or 
have more values, such as 62 possible phonemes (~) 
or tbxee stress markers (primary, secondary, or no 
stress, s), or a combination of these classes (159 com- 
bined phonemes and stress markers, Gs). 
2.3 Methodology 
Our empirical study focuses on measuring the abil- 
ity of the IQTP~Z learning algorithm to use the 
knowledge accumulated during learning for the clas- 
sification of new, unseen instances of the same 
(sub)task, i.e., we measure their generalisation accu- 
racy. (Weiss and Kulikowski, 1991) describe n-fold 
cross valida~iolz (~z-fold cv) as a procedure for mea- 
2Graphemic parsing is not represented in the CELBX 
data. We used an automatic alignment algorithm 
(Daelemans and Van den Bosch, 1997) to determine 
which letters axe the first o~ only letters of a grapheme. 
Modularity in Word Pronunciation systems 
instance 
number 
1 
2 
3 
4 
7 
letter-window instances 
left 
context I focus 
-'_ _ "b 
_ _ b o 
_b o o 
boo k 
oo k i 
ok i n 
k i n g 
fight 
context 
~o k 
ok i 
k i n 
i n g 
n g _ 
g 
J classifications 
II M A Q s Gs 
1 1 /b/ 1 /b/1 
o 1 lul o lulO 
o o i-I o I-IO 
o llkl OlklO 1 1 IU 0 IWO 
0 1 I~10 I~10 
0 0 I-I, ,,0 I-I0 
phoneme-window instances 
context focus 
_ /b/ 
- fi>l Inl 
I(>l lul I'1 fi>l Inl I'1 Ikt 
lul I'1 Ikl I11 
I-I Ikl I~1 I~1 
Ikl Id I~JI I'1 
fight elassif. 
context Y s 
/u/ /-/ /k/ 1 1 
/-/ /k/ /U 0 0 
/k/ h/ /~/ 0 0 
/~/ /~/ /-/ 1 0 
/~/ /-/ . 0 0 
/-/ _ 0 0 
- 0 0 
Table 1: Example of instances generated from the word booking, with dassificstious for all of the subtasks 
investigated, viz. M, A, Q, Y, s, and Gs. 
suzing generalisation accaxacy. For our experiments 
with IGTRBE, we set up 10-fold cv experiments con- 
sisting of five steps. (i) On the basis of a data set, n 
paxtitionings axe generated of the data set into one 
tra~ing set containing ((n-1)/n)th of the data set, 
and one test set contslnlng (l/n)th of the data set, 
per partitioning. For each partitioning, the three 
following steps axe repeated: (ii) Information-gain 
values for all (seven) features axe computed on the 
basis of the trAi~ing set (cf. Subsection 2.1). (iii) 
IQTRE~. is applied to the trai~i~g set, yielding an 
induced decision tree (el. Subsection 2.1). (iv) The 
tree is tested by letting it classify all instances in the 
test set, which results in a percentage of incorrectly 
classified test instances. (v) When each of the n folds 
has produced an error percentage on test material, 
a mean generalisation error of the leaxned model is 
computed. (Weiss and Kulikowski, 1991) argue that 
by using n-fold cv, preferably with n _> 10, one can 
retrieve a good estimate of the true generalisation 
error of a leaxning algorithm given an instance base. 
Mean results can be employed further in significance 
tests. In our experiments, n = 10, and one-tailed t- 
tests axe performed. 
3 Three word-pronunciation 
architectures 
Out experiments axe grouped in three series, each 
involving the application of IGTR~.B to a paxticu- 
la~ word-pronunciation system. The a~chitectures 
of these systems axe displayed in Figure 1. In the 
following subsections, each system is introduced, an 
outline is given of the experiments performed on the 
system, and the results a~e briefly discussed. 
3.1 M-A-G-Y-S 
The axchitectu~e of the M-A-G-Y-S system is inspixed 
by SGUND1 (Hunnicutt, 1976; Hunnicutt, 1980), 
the word-pronunciation subsystem of the MIT~kLK 
text-to-speech system (Allen, Hunnicutt, and Klatt, 
1987). When the MITALK system is faced with an un- 
known word, sounD1 produces on the basis of that 
van den Bosch, Weijters and Daelemans 188 
word a phonemic transcription with stress markers 
(Allen, Hunnieutt, and Klatt, 1987). This word- 
pronunciation process is divided into the following 
five processing components: 
1. morphological segmentalion, which we imple- 
ment as the module referred to as M; 
2. graphemic parsing, module A; 
3. grapheme-phoneme conversion, module G; 
4. sfllabifica~ion, module y; 
5. stress assignment, module s. 
The axchiteeture of the M-A-G-Y-S system is visu- 
alised in the left of Figure 1. It can be seen that the 
representations include direct output from previous 
modules, as well as representations from eaxlier mod- 
ules. For example, the s module takes as input the 
syllable boundaries generated by the Y module, but 
also the phoneme string generated by the G module, 
and the morpheme boundaxles generated by the M 
module. 
M-A-G-Y-S is put to the test by applying IGTREE 
in 10-fold cv experiments to the five subtasks, con- 
necting the modules after tr~i~i~g, and measuring 
the combined score on correctly classified phonemes 
and stress maxkers, which is the desired output of 
the word-pronunciation system. An individual mod- 
ule can be trained on data from C~.L~.X directly as 
input, but this method ignores the fact that mod- 
ules in a working modular system can be expected 
to generate some amount of error. When one module 
generates an error, the subsequent module receives 
this error as input, assumes it is correct, and may 
generate another error. In a five-module system, this 
type of cascading errors may seriously hamper gen- 
eralisation accuracy. To counteract this potential 
disadvantage, modules can also be trained on the 
output of previous modules. Modules cannot be ex- 
pected to leaxn to repair completely random, irreg- 
ular errors, but whenever a previous module makes 
con.sistent errors on a specific input, this may be 
recoguised by the subsequent module. Having de- 
tected a consistent error, the subsequent module is 
Modularity in Word Pronunciation systems 
I 
I 
I 
I 
I 
I 
I 
I 
I 
k 
k 
k 
II 
II 
II 
II 
II 
II 
II 
wdtten~ 
phoneme transcnp~on wffn stress 
wr~en wo~ 
phonemic ~ar, scnpeon with slre~s 
written word 
phonerr~c Wanscdp~on w~ s,~ss 
M- morpholog~e analysis 
A- 9raphernk: ~ 
G- gra~erne-pho~eme conversion 
Y- s~n 
S - s~asm~ 
com~n~ gr~-pho. GS- convegonan~ 
stress ass~nmer~t 
Figure 1: Architectures of the three investigated word-pronunciation systems. Left: M-A-G-Y-S; middle: 
M-G-S; right: GS. Rectangular boxes represent modules; the letter in the box corresponds to the subtask as 
listed in the legends (far right). Arrows depict data flows from the raw input or a module, to s module or 
the output. 
J 
12.0 - 
10.0 - 
8.0 
8.0 
4.{) 
2.0 
0.0 
7.67 
5- 
5,14 I 5.25 
1.N) 
, 
M A d V S 
10.59 
Figure 2: Generalisation errors on the M-A-G-Y-$ 
system in terms of the percentage of incorrectly alas- 
sifted test instances by IGTREE on the five subtasks 
M, A, G, Y, and s, and on phonemes and stress mark- 
ers jointly (PS). 
then able to repair the error and continue with suc- 
cessful processing. Earlier experiments performed 
on the tasks investigated in this paper have shown 
that classification errors on test instances are indeed 
consistently and significantly decreased when mod- 
ules are trained on the output of previous modules 
rather than on data extracted directly from C~.LP.X 
(Van den Bosch, 1997). Therefore, we train the M-A- 
G-Y-S system, with IGTRE~., by training the modules 
of the system on the output of predeceasing modules. 
We henceforth refer to this type of training as adap- 
tive tra;-;-g, referring to the adaptation of a module 
to the errors of a predecessing module. 
Figure 2 displays the results obtained with IGTREE 
under the adaptive variant of M-A-G-Y-S. The fig- 
ure shows all percentages (displayed above the bars; 
error bars on top of the main bars indicate standard 
van den Bosch, Weijters and Daelemans 189 
deviations) of incorrectly classified instances for each 
of the five subtasks, and a joint error on incorrectly 
classified phonemes with stress markers, which is the 
desired output of the system. The latter classifica- 
tion error, labelled PS in Figure 2, regards classifi- 
cation of an instance as incorrect if either or both 
of the phoneme and stress marker is incorrect. The 
figure shows that the joint error on phonemes and 
stress markers is 10.59% of test instances, on aver- 
age. Computed in terms of transcribed words, only 
35.89% of all test words are converted to stressed 
phonemic transcriptions flawlessly. The joint error 
is lower than the sum of the errors on the G subtask 
and the s subtask, 12.95%, suggesting that about 
20% of the incorrectly classified test instances in- 
volve an incorrect classification of both the phoneme 
and the stress marker. 
8.2 M-G-S 
The subtasks of graphemic parsing (A) and 
grapheme-phoneme conversion (G) are clearly re- 
lated. While A attempts to parse s letter string 
into grsphemes, G converts gzaphemes to phonemes. 
Although they axe performed independently in M- 
A-G-Y-S, they can be integrated easily when the 
elass-'l'-instances of the A task are mapped to theiI 
associated phoneme rather than '1', and the class- 
'0'-instances axe mapped to a phonemic null, /-/, 
rather than '0' (of. Table 1). This task integration 
is also used in the NETTALK model (Sejnowski and 
Rosenberg, 1987). A similar argument can be made 
for integrating the syllabification and stress assign- 
ment modules into a single stress-assignment mod- 
ule. Stress markers, in our definition of the stress- 
assignment subtask, are placed solely on the posi- 
tions which are also marked as syllable boundaries 
(i.e., on syllable-initial phonemes). Removing the 
Modularity in Word Pronunciation systems 
g 
_0= 
12.0 
10.0 
8.0 
6.0. 
4,0- 
2,0. 
0.0 
7.86 
M G S PS 
Figure 3: Generalisation errors on the M-G-S system 
in terms of the percentage of incorrectly classified 
test instances by IGTREE on the three snbtasks M, 
G, and s, and on phonemes and stress markers jointly (PS). 
12"0 l 
10.0 g 
~ 6.0 \] 7.41 
=~ 
¢0 3.79 3.97 
=~ 4.0- g 
2.0- 
0,0 . G S PS 
Figure 4: Percentage of generalisation ezrozs made 
by IGTRBE on the GS task, in terms of the percent- 
age incorrectly classified test instances as well as on 
phonemes and stress assignments computed sepa- 
rately. 
syllabification subtask makes finding those syllable 
boundaries which are rdevant for stress assignment 
an integrated paxt of stress assignment. Syllabifica- 
tion (Y) and stress assignment (s) can thus be inte- 
grated in a single stress-ussignment module s. 
When both pairs of modules are reduced to sin- 
gle modnles, the three-modnle system M-G-S is ob- 
tained. Figure 1 displays the architecture of the 
M-G-S system in the middle. Experiments on this 
system axe performed analogous to the experiments 
with the M-A-G-Y-S system; Figuxe 3 displays the av- 
erage percentages of generalisation errors generated 
by mTRP.E on the three subtasks and phonemes and 
stress markers jointly (the error bar labelled PS). 
Removing graphemic parsing (A) and syllabifica- 
tion (Y) as explicit in-between modules yields bet- 
ter accuracies on the grapheme-phoneme conver- 
sion (G) and stress assignment (s) subtasks than 
in the M-A-G-Y-S system. Both differences are sig- 
nltlcant; for G, (t(19) = 43.70,p < 0.001), and for 
S (t(19) = 32.00,p < 0.001). The joint accaxacy 
on phonemes and stress markers is also significantly 
better in the M-G-S system than in the M-A-G-Y-S 
system (g(37.50,p < 0.001). Ditferent from M-A-G- 
Y-S, the sum of the errors on phonemes and stress 
markers, 8.09%, is hardly more than the joint er- 
ror on PSs, 7.86%: there is haxdly an overlap in 
instances with incorrectly classified phonemes and 
stress markers. The percentage of flawlessly pro- 
cessed test words is 44.89%, which is maxkedly bet- 
ter than the 35.89% of M-A-G-Y-S. 
3.3 GS 
GS is a single-module system in which only one clas- 
sification task is performed in one pass. The GS 
task integrates grapheme-phoneme conversion and 
stress assignment: to classify letter windows as cor- 
responding to a phoneme wi~h a stress marker (PS). 
In the GS system, a PS can be either (i) a phoneme 
or a phonemic null with stress marker '0', or (ii) 
a phoneme with stress marker '1' (i.e., the first 
phoneme of a syllable receiving primary stress), or 
(iii) a phoneme with stress marker '2' (i.e., the first 
phoneme of a syllable receiving secondary stress). 
The simple architecture of GS, which does not reflect 
any linguistic expert knowledge about decomposi- 
tions of the word-pronunciation task, is visualised 
as the rightmost architectaxe in Figure 1. It only 
assumes the presence of letters at the input, and 
phonemes and stress maxkers at the output. Ta- 
ble 1 displays example instance PS classifications 
generated on the basis of the word booking. The 
phonemes with stress markers (PSs) axe denoted by 
composite labels. For example, the first instance in 
Table 1, __book, maps to class label ~b/l, denot- 
ing a/b/ which is the first phoneme of a syllable 
receiving primary stress. 
The experiments with GS were performed with the 
same data set of word pronunciation as used with M- 
X-G-Y-S and M-G-S. The number of PS classes (i.e., 
all possible combinations of phonemes and stress 
markers) occurring in this data base of tasks is 159. 
Figure 4 displays the generalisation errors in terms 
of incorrectly classified test instances. The figure 
also displays the percentage of classification errors 
made on phonemes and stress markers computed 
separately. 
IGTEEE yields significantly better generalisation 
accuracy on phonemes and stress markers, both 
jointly and independently. In terms of PSs, the accu- 
racy on GS is significantly better than that of M-G-S 
with (t(19) = 40.48,p < 0.001), and that of M-A- 
G-Y-S with (~(19) = 6.90,p < 0.001). Its accuracy 
on flawlessly transcribed test words, 59.38%, is also 
considerably better than that of the modnlax sys- 
tems. Compared to accuracies reported in related 
zeseaxch on learning English word pronunciation (Se- 
jnowski and Rosenbezg, 1987; Wolpert, 1990; Diet- 
van den Bosch, Weijters and Daelemans 190 Modularity in Word Pronunciation systems 
I 
I 
| 
l 
l 
I 
l 
I 
I 
I 
I 
I 
/ 
/ 
/ 
,4 
,4 
A 
500000 
4O0OOO 
--~ 300000 
E= 2ooooo c: 
100000 
S 
S G 
G 
G) M M 
I~A-G-Y-S M-G-$ C~ 
Figure 5: Average numbers of nodes in the decision 
trees generated by IGTREE for the M-A-G-Y-S, M- 
G-S, and Gs systems. Compartments indicate the 
numbers of nodes needed for the trees of the subtasks 
specified by their labels. 
terich, Kiid, and Bakifi, 1995; Yvon, 1996) and on 
general quality demands of text-to-speech applica- 
tions, an error of 3.79% on phonemes and 30.62% 
on words can be considered adequate, though still 
not excellent (¥von, 1996; Van den Bosch, 1997). 
4 Comparisons of M-A-G-Y-S, 
M-G-S, and GS 
We have given significance results showing that, un- 
der our experimental conditions and using IGTREE 
as the learning algorithm, optimal generalisation ac- 
curacy on word pronunciation is obtained with GS, 
the system that does not incorporate any explicit 
decomposition of the word-pronunciation task. In 
this section we perform two additional comparisons 
of the three systems. First, we compare the sizes of 
the trees constructed by IGTREE on the three sys- 
tems; second, we analyse the positive and negative 
effects of learning the subtasks in their specific sys- 
tems' context. 
Tree sizes 
An advantage of using less or no decompositions in 
terms ofcomputationul ei~ciency is the total amount 
of memory needed for storing the trees. Although 
the applieation of IGTREE generally results in small 
trees that fit well inside small computer memories 
(for out modulax (sub)tasks, tree sizes waxy from 
64,821 nodes for the M-modules to 153,678 nodes 
for the G-module in M-A-G-Y-S, occupying 453,747 
to 1,075,746 bytes of memory), keeping five trees in 
memory would not be a desirable feature for a sys- 
tem optimised on memory use. Figure 5 displays 
the summed number of nodes for each of the four 
IGTReE-tramed systems under the adaptive vaxiant. 
Each bax is divided into compartments indicating 
the amount of nodes in the trees generated for each 
of the modular subtasks. 
van den Bosch. Weijters and Daelemans 191 
Figure 5 shows that the model with the best gen- 
eralisation accuracy, GS, is also the model taking up 
the smallest number of nodes. The amount of nodes 
in the single Gs tree, 111,062, is not only smaller 
than the sum of the amount of nodes needed for 
the G and s modules in the M-G-S system (204,345 
nodes); it is even smaller than the single tree con- 
structed for the G subtask in the M-G-S system 
(125,182 nodes). 
A minor difference in tree size can be seen between 
the trees built for the G-module in the M-G-S system, 
125,182 nodes, and the G-module in the M-A-G-Y-S 
system, 153,678 nodes. A similar difference can be 
seen for the s-modules, taking up 79,163 nodes in 
the M-G-S system, and 96,998 nodes in the M-A-G- 
Y-S system. The size of the trees built for modules 
appears to increase when the module is preceded by 
more modules, which suggests that IGTREE is faced 
with a more complex task, including potentially er- 
roneous output from more modules, when building 
a tree for a module further down a sequence of mod- 
ules. 
Utility effects 
The paxticunax sequence of the five modules as in 
the M-A-G-Y-S system reflects a number of assump- 
tions on the utilit~l of using output from one subtask 
as input to another subtask. Morphological knowl- 
edge is useful as input to grapheme-phoneme conver- 
sion (e.g., to avoid pronouncing ph in loophole as/f/, 
or red in barred as/ted/); graphemic parsing is use- 
ful as input to grapheme-phoneme conversion (e.g., 
to avoid the pronunciation of gh in through); etc. 
Thus, feeding the output of a module A into a subse- 
quent module B implies that one expects to perform 
better on module B with A's input than without. 
The accuracy results obtained with the modules of 
the M-A-G-Y-S, M-G-S, and GS systems can serve as 
tests for their respective underlying utility assump- 
tions, when they axe compared to the accuracies ob- 
tained with their snbtasks learned in isolation. 
To measure the utility/effects of including the out- 
puts of modules as inputs to other modules, we per- 
formed the following experiments: 
1. We applied IGTREE in 10-fold cv experiments to 
each of the five subtasks M, A, G, Y, and s, only 
using letters (with the M, A, G, and s snbtasks) 
or phonemes (with the Y and the s subtasks) 
as input, and their respective classification as 
output (cf. Table 1). The input is directly ex- 
tracted from CELEX. These experiments pro- 
vide the baseline score for each subtask, and 
axe referred to as the isolated experiments. 
2. We applied IGTIIEE in 10-fold Cv experiments 
to all subtasks of the M-A-G-Y-S, M-G-S, aald GS 
systems, training end testing on input extracted 
directly from CP.LEX. The results from these ex- 
periments reflect what wound be the accuracy of 
Modularity in Word Pronunciation systems 
~ation error 
isolated \[ ideal (utility) I actual (utility) 
M-A-G-Y-S 
M 5.14 5.14 (o.oo) 5.14 (0.00) 
A 1.39 1.66 (--0.27) 1.50 (--0.11) 
Q 3.72 3.68 (+0.04) 7.67 (-3.95) 
y 0.45 0.75 (-0.30) 2.63 (-2.16) 
s 7.96 2.67 (+5.29) 5.28 (+2.68) 
M-G-S 
M 5.14 5.14 (0.00) 5.14 (O.O0) 
G 3.72 3.66 (+0.06) 3.99 (-0.27) 
s 7.96 3.97 (+3.99) 4.10 (+3.86) 
GS 
o 3.721 - - 3.79 (-0.07) 
s 4.71 I - - 3.97 (+0.74) 
Table 2: Overview of utility effects of learning sub- 
tasks (M, A, G, Y, and s) as modules or partial tasks 
in the M-A-O-Y-S, M-O-S, and GS systems. For each 
module, in each system, the utility of tra;~ing the 
module with ideal data (middle) and actual, modu- 
lar data under the adaptive variant (fight), is com- 
pared against the accuracy obtained with learning 
the subtasks in isolation (left). Accuracies are given 
in percentage of incorrectly classified test instances. 
the modular systems when each module would 
perform perfectly flawless. We refer to these ex- 
periments as ideal 
With the results of these experiments we mea- 
sure, for each subtask in each of the three systems, 
the utility effect of including the input of preceding 
modules, for the ideal case (with input straight from 
CP.LEX) as well as for the actual case (with input 
from preceding modules). A utility effect is the dif- 
ference between IGTItEE'S generalJsation error on the 
subtask in modular context (either ideal or actual) 
and its accuracy on the same subtask in isolation. 
Table 2 lists all computed utility effects. 
For the ease of the M-A-G-Y-S system, it can 
be seen that the only large utility effect, even in 
the ideal case, could be obtained with the stress- 
assignment subtask. In the isolated case, the input 
consists of phonemes; in the M-A-G-Y-S system, the 
input contains morpheme boundaries, phonemes, 
and syllable boundaries. The ideal positive effect 
on the s module of 5.29% less errors turns out 
to be a positive effect of 2.68% in the actual sys- 
tem. The latter positive effect is outweighed by a 
rather large negative utility effect on the grapheme- 
phoneme conversion task of-3.95%. Both the A and 
y subtasks do not profit from morphological bound- 
aries as input, even in the ideal case; in the actual M- 
A-G-Y-S system, the utility effect of including mor- 
phological boundaries from M and phonemes from G 
in the syllabification module Y is markedly negative: 
-2.16%. 
In the M-G-S system, the utility effects are gen- 
erally less negative than in the M-A-G-Y-S system. 
There is a small utility effect in the ideal case 
with including morphological boundaries as input 
to grapheme-phoneme conversion; in the actual M- 
Q-S system, the utility effect is negative (-0.27%). 
The stress-assignment module benefits from includ- 
ing morphological boundaries and phonemes in its 
input, both in the ideal case and in the actual M-G- 
S system. 
The Gs system does not contain separate mod- 
ules, but it is possible to compare the errors made 
on phonemes and stress assignments separately to 
the results obtained on the subtasks learned in isola- 
tion. Grapheme-phoneme conversion is learned with 
almost the same accuracy when learned in isolation 
as when learned as partial task of the Gs task. Learn- 
ing the grapheme-phoneme task, IGTR~.~. is neither 
helped nor hampered significantly by learning stress 
assignment simultaneously. There is a positive util- 
ity effect in learning stress assignment, however. 
When stress assignment is learned in isolation with 
letters as input, IGTI~B classifies 4.71% of test in- 
stances incorrectly, on average. (This is a lower error 
than obtained with learning stress assignment on the 
basis of phonemes, indicating that stress assignment 
should take letters as input rather than phonemes.) 
When the stress-assignment task is learned along 
with grapheme-phoneme conversion in the Gs sys- 
tem, a marked improvement is obtained: 0.74% less 
classification errors are made. 
Snmmaxising, comparing the accuracies on modu- 
lax subtasks to the accuracies on their isolated coun- 
terpart tasks shows only a few positive utility effects 
in the actual system, all obtained with stress as- 
signment. The largest utility effect is found on the 
stress-assigument subtask of M-G-S. However, this 
positive utility eifect does not lead to optimal ac- 
curacy on the s subtask; in the Gs system, stress 
assignment is performed with letters as input, yield- 
ing the best accuracy on stress assignment in our 
investigations, viz. 3.97% incorrectly classified test 
instances. 
5 Related work 
The classical NETTXLE paper by (Sejnowski and 
P~osenberg, 1987) can be seen as a primaxy source 
of inspiration for the present study; it has been so 
for a considerable amount of related work. Although 
it has been cfiticised for being vague and presumptu- 
ons and for presenting generalisation accuracies that 
can be improved easily with other learning meth- 
ods (Stanfill and Waltz, 1986; Wolpert, 1990; Weij- 
ters, 1991; Yvon, 1996), it was the first paper to 
investigate gtapheme-phoneme conversion as an in- 
teresting application for general-purpose learning al- 
gofithms. However, few reports have been made on 
van den Bosch, Weijters and Daelemans 192 Modularity in Word Pronunciation systems 
ss 
| 
m 
m 
m 
II 
I 
m 
IE 
I 
/ 
/ 
/ 
II 
I 
I 
/ 
I 
/ 
/ 
/ 
the joint accuracies on stress markers and phonemes 
in work on the NETTALK data. To our knowledge, 
only (Shsvlik, Mooney, and Towell, 1991) and (Di- 
etterich, Hild, and Bnkiri, 1995) provides such re- 
ports. In terms of incorrectly processed test in- 
stances, (Shavlik, Mooney, and Towcll, 1991) ob- 
tain better performance with the back-propagation 
algorithm trained on distributed output (27.7% er- 
rors) than with the IV3 (Qnlnlan, 1986) decision-tree 
algorithm (34.7% errors), both trained and tested 
on small non-overlapping sets of about 1,000 in- 
stances. (Dietterich, Hild, and Baklri, 1995) re- 
ports similar errors on similarly-sized tradning and 
test sets (29.1% for BP and 34.4% for Iv3); with a 
larger training set of 19,003 words fxom the NETT&LK 
data and an input encoding tlfteen letters, previous 
phoneme and stress classifications, some domain- 
specific features, and error-correcting output codes 
IV3 generates 8.6% errors on test instances (Diet- 
terich, Hild, and Bakiri, 1995), which does not com- 
pare favourably to the results obtained with the 
NETTALK-Iike GS task (a valid comparison cannot 
be made; the data employed in the current study 
contains considerably more instances). 
An interesting counterargument against the repre- 
sentation of the word-pronunciation task using fixed- 
size windows, put forward by Yvon (Yvon, 1996), is 
that an induetive-leaxning approach to grapheme- 
phoneme conversion should be based on associating 
vaxiable-length chunks of letters to variable-length 
chunks of phonemes. The chunk-based approach 
is shown to be applicable, with adequate accu- 
racy, to several corpora, including corpora of French 
word pronunciations and, as mentioned above, the 
NBTTALK data (Yvon, 1996). Experiments on other 
(larger) corpora, comparing both approaches, would 
be needed to analyse their differences empirically. 
6 Discussion 
We have demonstrated that a decision-tree learning 
algorithm, IGTREP., is able to learn English word pro- 
nuneiation with modest to adequate generalisation 
accuracy: the less the leanting task is decomposed in 
subtasks, the more adequate the generalization accu- 
racy obtained by IGTP,.EE is. The best generalisation 
accuracy is obtained with the GS system, which does 
not decompose the task at all. The general disad- 
vantage of the investigated modular systems is that 
modules do not perform their tasks flawlessly, while 
their expert-based decompositions do assume flaw- 
less performance. In practice, modules produce a 
considerable amount of irregular errors which cause 
subsequent modules to generate subsequent 'cascad- 
ing' errors. Only the subtask of stress assignment is 
shown to be learned more successfully on the basis 
of modular input. 
The best-performing system, Gs, is trained to map 
windows of letters to combined class labels repre- 
seating phonemes and stress maskers. Compared 
to the M-A-G-Y-S and M-G-S systems, the Gs sys- 
tem (i) lacks an explicit morphological segmenta- 
tion and (ii) learns stress assignment jointly with 
grapheme-phoneme conversion on the basis of let- 
ter windows rather than phoneme windows. These 
two advantageous properties of the ~s system lead 
to three suggestions. First, it appears better to leave 
morphological segmentation an implicit snbtask; it 
can be left to the learning algorithm to extract the 
necessary morphological information needed to dis- 
ambiguate between alternative pronunciations di- 
rectly from the letter-window input. Second, letter- 
window instances provide the most reliable source of 
input for both grapheme-phoneme conversion sad 
stress assignment. Third, stress assignment and 
grapheme-phoneme conversion can be integrated in 
one task, i.e., to map letter instances to 'stressed 
phonemes'. 
A warning on the scope of these suggestions needs 
to be issued. The results described here are not 
only dependent of the resource (tELEX) and the 
(sub)task definitions (classification of windowed in- 
stances), but also on the use of IQTI~EE as the learn- 
ing algorithm. The CEL~.X data appears robust sad 
provides an abundance of English word pronunci- 
ations, not an inappropriately skewed subset of the 
English vocabulary. The windowing method appeass 
a salient method to rephrase language tasks as clas- 
sification tasks based on fixed-length inputs. It is 
not cleat, however, to what extent IGTREE can be 
held responsible for the low accuracy on M-A-G-Y- 
S and M-G-S; IGTREE may be negatively sensitive 
in terms of generalisation accuracy to irregular ex- 
rots in the input of a modular subtask. Although 
irregulax errors axe an inherent problem for modu- 
lax systems, other leaxning algorithms may be able 
to handle such errors differently. Experiments with 
back-propagation learning applied to the same mod- 
nlar systems show siginficantly worse performance 
than that of IQTRv.E (Van den Bosch, 1997). It 
might be possible that instance-based learning algo- 
ritkms (e.g., IBI-IG (Daelemans and Van den Bosch, 
1992; Daelemans, Van den Bosch, and Weijters, 
1997)), which have been demonstrated to outper- 
form IGTREE on several language tasks (Daelemans, 
GiIlis, and Durieux, 1994; Van den Bosch, Dacle- 
roans, and Weijtets, 1996; Van den Bosch, 1997), 
perform better on the modular systems. Although 
such systems trained with IBI-IG would be compu- 
rationally rather inefficient (Van den Bosch, 1997), 
employing IBI-IG in learning modulas subtasks may 
lead to other differences in accuracy between modu- 
lax systems. 
A conclusion to be drawn from our study is that 
it is possible to learn the complex language task of 
English word pronunciation with a general-purpose 
inductive-learning algorithm, with an adequate level 
of generalisation accuracy. The results suggest that 
van den Bosch, Weijters and Daelemans 193 Modularity in Word Pronunciation systems 
the necessity of decomposing word-pronunciation 
in several subtasks should be reconsidered case- 
fully when designing an accuracy-oriented word- 
pronunciation system. Undesired errors generated 
by sequenced modules may outweigh the desired pos- 
itive utility effects easily. 
Acknowledgements 
We thank Eric Postma, Maria Wolters, David Aha, 
Bertjan Busser, Jskub Zavrel, and the other mem- 
bers of the Tilburg ILK group for fruitful discus- 
sions. 

References 
Allen, J., S. Hunnicutt, and D. Klatt. 1987. From test 
to speech: The MITaik system. Cambridge, UK: Cam- 
bidge University Press. 
Bloomfield, L. 1933. Language. New York: Holt, Rine- 
hard and Winston. 
Breiman, L., J. Friedman, R. Ohlsen, and C. Stone. 
1984. Classification and regression trees. Belmont, 
CA: Wadsworth International Group. 
Burnage, G., 1990. CBLBX: A guide for users. Centre 
for Lexical Information, Nijmegen. 
Chomsky, N. and M. Halle. 1968. The sound pattern of 
English. New York, NY: Harper and Row. 
Daelemems, W. 1988. Grafon: A grapheme-to-phoneme 
system for Dutch. In Proceedings T, velflh Inter- 
national Conference on Computational Linguistics 
(COLING-88), Budapest, pages 133-138. 
Daelemans, W. 1996. Experience-driven language ac- 
quisition and processing. In M. Van der Avoird and 
C. Corsius, editors, Proceedings of the CLS Opening 
Academic Year 1996-1997. Tilbexg: CLS, pages 83- 
95. 
Daelemans, W., S. Gillis, and G. Durieux. 1994. The 
acquisition of stress: a data-oriented approach. Com- 
putational £ inguistics, 20(3):421-451. 
Daelemans, W. and A. Van den Bosch. 1992. Generali- 
sation performance of backpropagation learning on a 
syllabification task. In M. F. J. Drossaers and A. Ni- 
jholt, editors, TWLT3: Connectionism and Natural 
Language Processing, pages 27-37, Enschede. Twente 
University. 
Daelemans, W. and A. Van den Bosch. 1997. Language- 
independent data-oriented grapheme-to-phoneme con- 
version. In J. P. H. Van Santen, R. W. Sproat, J. P. 
Olive, and J. Hirschberg, editors, Progress in Speech 
Processing. Berlin: Springer-Verlag, pages 77-89. 
Daclemans, W., A. Van den Bosch, and A. Weijters. 
1997. IGTree: using trees for classification in lazy 
learning algorithms. Artificial Intelligence Revietv, 
11:407-423. 
De Saussure, F. 1916. Course de linguistique g~n~rale. 
Paris: Payot. edited posthumously by C. Bully and 
A. Riedimger. 
Dietterich, T. G., H. Hi\]d, and G. Baklzi. 1995. A com- 
parison of Iv3 and backpropagation for English text- 
to-speech mapping. Machine Learning, 19(1):5-28. 
Goldsmith, J. 1976. An overview of autosegmentul 
phonology. Linguistic Analysis, 2:23-68. 
Hunnicutt, S. 1976. Phonological rules for a text-to- 
speech system. American Journal of Computational 
Linguistics, Microfiche 57:1-72. 
Hunnicutt, S. 1980. Grapheme-phoneme rules: a review. 
Technical Report STL QPSR 2-3, Speech Transmis- 
sion Laboratory, KTH, Sweden. 
Koskenniemi, K. 1984. A general computational model 
for wordform recognition and production. In Proceed- 
ings of the Tenth International Conference on Compu- 
tational Linguistics / ~nd Annual Conference of the ACL, 
pages 178-181. 
Liberman, M. and A. Prince. 1977. On stress and lin- 
guistic rhythm. Linguistic Inquiry, (8):249-336. 
Mitchell, T. 1997. Machine learning. New York, N'Y: 
McGraw Hill. 
Mohanan, K. P. 1988. The theoey of lez~cal phonology. 
Dordxecht: D. Reidel. 
Piatelli-Palmarini, M., editor. 1980. Language learning: 
The debate bettveen Jean Piaget and Noam Chon,~k-y. 
Cambridge, MA: Harvard University Press. 
Qulnlan, J. R. 1986. Induction of decision trees. Ma- 
chine Learning, 1:81-206. 
Qulnlan, J. R. 1993. c4.5: Programs for machine learn- 
ing. San Marco, CA: Morgan Kanfmann. 
Sejnowski, T. J. and C. S. Rosenberg. 1987. Parallel net- 
works that learn to pronounce English text. Complez 
Syster~, 1:145-168. 
Shavlik, J. W., R. J. Mooney, and G. G. Towell. 1991. 
Symbolic and neural learning algorithms: An experi- 
mental comparison. Machine Learning, 6:111-143. 
Stanfdl, C. and D. Waltz. 1986. Toward memory-based 
reasoning. Communications of the ACM, 29(12):1213- 
1228. 
Van den Bosch, A. 1997. Learning to pronounce mritten 
~vords, a study in inductive language learning. Ph.D. 
thesis, Uulversiteit Maastricht. 
Van den Bosch, A. and W. Daclemans. 1993. Data- 
oriented methods for grapheme-to-phoneme conver- 
sion. In Proceedings of the 6th Conference of the 
EA CL, pages 45-53. 
Van den Bosch, A., W. Daelemans, and A. Weij- 
ters. 1996. Morphological analysis as .¢lassi$.ca- 
tion: an inductive~learning approach. In K. Oflszer 
and H. Somers, editors, Proceedings of NeMLaP-~, 
Ankara, Turkey, pages 79-89. 
Weijters, A. 1991. A simple look-up procedure supe- 
rior to NETtalk? In Proceedings oJICANN.91, Espoo, 
Finland. 
Weiss, S. and C. Kulikowski. 1991. Computer system8 
that learn. San Mateo, CA: Morgan Kaufmann. 
Wolpert, D. H. 1990. Constructing a generalizer supe- 
rior to NETtalk via a mathematical theory of gener- 
a~ation. Neural Networks, 3:445-452. 
Yvon, F. 1996. Peononcer par analogie: motivation, 
forrnalisation et ~valuation. Ph.D. thesis, Ecole Na- 
tionale Sup~rieure des T~l~communication, Paris. 
