ISSUES IN TEXT-TO-SPEECH FOR FRENCH 
Evelyne Tzoukermann 
AT&T Bell Laboratories 
600 Mountain Avenue, Murray tlill, N.J. 07974 
evelyne@rcsearch, art.corn 
Abstract 
This paper reports the progress of the French 
text-to-speech system being developed at AT&T 
Bell Laboratories as part of a larger project for 
multilingual text-to-speech systems, including lan- 
guages such as Spanish, Italian, German, Rus- 
sian, and Chinese. These systems, based on di- 
phone and triphone concatenation, follow the gen- 
eral framework of the Bell Laboratories English 
TTS system \[?\], \[?\]. This paper provides a de- 
scription of the approach, the current status of the 
French text-to-speech project, and some problems 
particular to French. 
1 Introduction 
In this paper, the new French text-to-sIieech sys- 
tem being developed at AT&T is presented; sev- 
eral steps have been already achieved while others 
are still in progress. First we present a brief' de- 
scription of the phonetic inventory of French, with 
a discussion of the approach used to select and 
segment phonetic units for the system. Methods 
for automatic segmentation, and for the choice of 
diphone and triphone units are presented. Some 
comments on durational and prosodic issues fol- 
low. We conclude with some discnssions on direc- 
tions for fllture improw.'ment, including morpho- 
logical analysis, part-of-speech tagging, and par- 
tial phrasal analysis for the purpose of phrasal 
grouping. 
Phonetic Description of 
French 
The French phonetic system consists of 36 
phonemes, including 17 consonants, 16 vowels, 
and 3 semi-vowels. Table 1 shows the different 
phonemes; the IPA column contains the phonemes 
in the standard International Phonetic Alphabi,t; 
the second column ASCII shows the ascii correspon- 
dence of these characters for the text-to-speech 
system, and the third column shows art example 
of the phoneme in a French word. 
Consonant s Vowels 
IPA ASCII WORD IPA ASCII WORD 
p p paix 
t t tout 
k k eas 
b b bas 
iI d dos 
g g gai 
m m mais 
n n liOn 
.p N gagner 
l 1 livre 
f f faux 
s s si 
f S chanter 
v v vive 
z z zero 
3 Z jupe 
r r rare 
i i vive 
e e the 
e g aisc 
a a table 
u a time 
3 > homme 
o o tgt 
u U boue 
y y tour 
n ellX 
ce @ seul 
o & peser 
I bain 
~t A bane 
5 O bon 
de I brun 
o A samedi 
Semi-vowels 
IPA ASCII WORD 
j j yeux 
w w oui 
q W huit 
Table 1: French Phonetic Phonemes 
For the French text-to-speech synthesis system 
we use 35 phonemes, consisting of 17 consonants, 
15 vowels (and not 1{3 like in the n,a cohlmn), and 
3 semi-vowels. As shown in Table 1, the fourth 
nasal /de/ has been removed, /07,/ and /g/ being 
represented by the single phoneme /g/. The rea- 
sons for this change are that (1) /de/ tends to be 
assimilated to the phoneme/g/, and (2) this nasal 
vowel occurs in very few words in French. Thus, 
976 
iC eould be said thai, functionally the disi, inel, ion 
I)(;Cwoon i< 'l and is ininiinal. Prcneti also COil-. 
rains two \])holiOlilOS for 1,he eharaetor "a", /al and 
/q/ , the first ouo hoing a front unrounded vowel 
and the second one abael( romidod vowel. A small 
iillliiboi: of l,'r<;n{:h spcal<crs lli;\[ko this I)roduetion 
ai<l<l i)<;i:C<~l)Cu~-d disl, hietion; in addiiAon, Coday's 
tendency shows a dis;-q)t>caraii<:o of 1,his I)honeniie 
disc.l, hieCion. Therefore, ouly /a/, the IliOsL {'orii-. 
liiOll t>holiellle of the Iwo> was roCahiod for s/nthc 
sis. NoCiee thaC I, wo dilfcronC "sehwas" (or lilllto I;~)> 
,,la,.kod as It+/and /A/wc,:o retaino{l for synl, hc 
sis; sin<'.<e sehwa in spokeil l"rcneh ca, it t)~, iu SOlliO 
crises, prosollC or not dcpondiug 011 i, hc level of 
fornlality of i&ilgu&ge it is iisot'ul Co ll~-wo Owe 
dilfo=renC signs Co aeeounl for I.tiis option, l,l addi- 
tion, Cho graphcnio-I;o:pholieirie systelil IlSOd ill the 
Vronch TTS sysColn and dose.ribod hi SocCion ??, 
is o=quipp<;d wiCh the Cal>at)ility of ineh.ling or .ot 
I, Ilc schwa <lol>on<ling on the lc'w'.l ot' language. For 
ex~-~inplo> Clio sonl;onco "jo Ill'Oil V~-tis s;uncdi", I ai'i'l 
h:auiny on saturday, (;21.II lie said <tither/3,) lll('l Vg 
samc)<li/ or, liioro eolloquially, /:,;mh ve samdi/ , 
dot)ondiug on whether the schwa ix reduced or noC. 
In olir systoiii, l;ho solil,(;llCO will t>o I, ra, nseribcd t:+/x 
HI('I Vg sanlAdi/, A ;-t(;eOlllitiilg; for the Cl';tce of 
the schwa. An ad<liCioual eilaraccer "*", was uso<l 
to r{qpro.sent silences aC the Iwginlling and end of 
WOl'(ts, 
Ig'onch Idlouenies (:au also he viewed ac<:ord 
itig t(> their Sl)OeCfal variabilil.y iu the eont,oxC of 
oCher i>honoliiOS, li, is knowil thaC l,'ronch vowels 
show spectral stability ;MIll low c()llt(~XCllltl vari 
ahility \[?\], \[?\]. '1't1{; voiceless f,.icaCivos show some- 
wiled; less spoeCra.l sCal>iliCy, I;tioai Chc plosives. The 
nasals and voieod fricatives present ow!n less sCa= 
hility. Ifi<luids l/l/ and/r/)and semi vowels l/j/, 
/w/, /q/) arc the i>ho,~omcs showing high vari- 
ahiliCy a,n(l this poses prot>hmis in diphono hasod 
synl;hosis \[?\]. Liquids ai'o very scrisiCive Co lh<='ir 
eolltoxC; forinaAiC strllei.;tlres show subsCanl, ial cf 
fects of c.oart, icuhd.ioti. As for the s<;mi-vowols, il. 
is ditliculC I.o ot~t)Clll'O Che ZOllO of spec.tral stability. 
For those' reasons, some researchers, o.g. \[?\], 
orgauizc l)iionernie classi(i<:ation using Che crit<;- 
ria of the stable vs unsi;ablc phone.me raChor than 
place of arCieulation. Sinii\]ar to Clio approach in 
Ill<'. l'\]nglish TTS sysCerii, syi'lthesis for French is 
(tolie using f>restorc'<\] liilil,s. Within this frainc 
work, there are various stralo<~gies for 1,he colh'.o-- 
l, ion of uniCs, units i,hat will then eonsl, iCui;e the 
dicl, ionary of polyphonos. 1)lie to Chc eoil{inuo.\[ 
a.spe<:l; of the speech signal and tile fact chaC the 
lt&Cllro of l)honenies is greatly modified in the 
eouix~xl of ol hor phon<mlc's, SylllrhOSiZiIlg separat.~ 
pIIOliONIOS ea.niioC (:a.pCllro ;trticllla.Cory aSl>ocCs of 
the languag(!. Ad(lil, ioually, transitions are harder 
Co modo.I I,han steady staCo.s. Thus, diphones are 
l&c standard minimal uniCs in segmental synCho 
sis. Froln an acoustic stan(IpoinC, a diphollo (;ram 
be seen as a signal passing from/,he co.Cral parC of 
~ !c)holmm('. Co the central pare of the sut>soquelHi 
ph<mcmo; iu oth<~r words, it is a unit oonllmSed 
of Cwo half phonemo.s. At a sogmo.nt,al low:l, one 
cau Chink of a diphone as a sCored length el'st>etch 
ChaC goes fi:om nt'm: the target of one phonelne {tilt\[ 
cxCen<ls Co near I.he t;-trg~.'C of Cho followiug one, ia 
ocher word l.ho CransiCion \[?\]. 
'l'h<~ earliest diphono, systcln was <loscrihed hy 
I'oCcrson oC al \[?\]; ocher <liphono apl>roa<:hes have. 
been roi>orC<xl by \[?\], \[?\], \[?\], an<t \[?\]. AlChough 
there are only about 40 phonemo.s in/"nglish al)out 
1600 diphonos sulfieo= for synthesis. Nev<;rthe 
less) b('.eaaise of lllllNerOLlS allophono.s and the face 
that some dil>hones are not really conCexC floe, re' 
searchers like I'ctcrsou suggesl, that, aboul. 8000 
<tiphoHes are nce<t<xl for high quality <liphone syn.- 
thesis. Moreover> the vowel diphtongs in gnglistl 
could be trcato.d as peudo-diphones, l,'arly Iq'cneh 
synthesis systems \[?\] relied also on sym, hesis by 
diphouos exc<'pt for the. diphone \[qi\] that is into 
gratc'{l in a Cril)honi<: group. This phonemic pair 
was sCore<l diff<,rontly hoeauso of its high fr<!qu<mcy 
iu lg'onch in oe<:urrcnces such as "hii" him~her. In 
lnoro recent work, systelliS (;olltaiil diphonos and 
larger units, such as Cril>hones , quadriphonos> and 
evol, q,,intophonos \[?\] \[?\], iu order to capture 
eoarticu\[a.tory lihononio.na of a longer domain that 
would iloli be adequately irio<l<'.lcd in a stric.tly di- 
\]>honic system. 
lu the current sysCem, the dil>hone invcutory for 
lq'ench was built by taking 35 ~ phonernic pairs, 
Chat is 1225 ilnits. Ad<lod Co that was Clio silence 
symbol in initial and final position, which adds 
a, lioChor 70 phoneniic \[)aii:s, \[gl'OIH this iniCial sol;, 
l, he pairs of se.lni-vowels wcrc relnow;d. All the 
ottior <x)mt)inations were kept. Even though all of 
th('.Ill do llOt oecllr ill French lexical strueCure, they 
<:a. still app<!ar in tile intcr-wor<l boundaries. For 
oxaml>lc , the sequence /lr/ is not permiCted word 
internally, but imist be handled since it appears 
in the interwor<l assimilation in /val r.jc/ "valenC 
rion" cost 'n, othiny. This is partieularly iinportant 
in French sin<:e inter-word liaison is comnion as 
in /el z 5/ "ell<;s ont" they have vs /el s5/ "olios 
sont" they are, whero the final consonant/s/eithor 
undergo0s liaison wiC\]l the vowo,1 /5/ rosulting in 
/z/, or undergoes linking with the consonani, ts/ 
977 
resulting in the devoiced sibilant. 
2.1 Diphone Structure and Selec- 
tion of Carrier Word 
2.1.1 Structure of Diphones 
This section discusses the nature, of the diphone set 
and the manner in which diphones were collected. 
Diphones are structured as \[bllows: 
*V, *C, g*, C*, CV, VC, CO, gV 
where * is a silence, C a consonant, and V a 
vowel. Semi vowels were treated in the same fash- 
ion as consonants. Diphones were recorded fol- 
lowing two (lif\[erent strategies: the first one con- 
sisted of picking existing words from a dictionary 
list. The second consisted of deciding on a neu- 
tral phonetic context in using logatornes or non- 
existing words. Logatomes are phonotactiically 
well-formed strings, which do not exist as words 
in the current French language. 
2.1.2 Selection of existing words from 
machine-readable dictionary 
A word list was extracted from a subset of the 
Robert French dictionary \[?\] and the pronuncia- 
tion fields were extracted. The dictionary contains 
almost 89,000 entries, of which 85,796 entries con- 
Lain a headword, a phonemic transcription, and a 
part of speech. The remaining entries are prefixes 
and suffixes. The first task consisted of convert- 
ing and mapping the dictionary phonemic symbols 
to the ones adopted in our system (shown in ta- 
ble 1). This was not straightforward since there 
was not always a one-to-one mapping between the 
two sets. For handling symbol mapping, a pro- 
gram was written that converts any set of charac- 
ters to any other set of characters I. The program 
is developed so that characters coded in octal or 
decimal code not only can he translated in either 
code, but also can be input in ascii format for be- 
ing converted 2 
Quite often, there was more than one pronunci- 
ation in the phonetic field and the. pattern match- 
ing program chose the pronunciation correspond- 
ing to the one required. Moreover, dictionary pro- 
11 am very grateflfl to Mike Tanenblatt who wrote this 
program and made a succession of changes until complete 
flexibility of character conversion was obtained. 
2 This tool allowed the conversion of databases originally 
written on Macintosh, PC, or Unix. Additionally, we used 
it to convert all the French textual databases into latin1 8 
bit encoding format. 
nouneiation fields are often not phonetically line- 
grained enough for acceptable speech output (see 
\[?\] for a discussion on machine-readable dictionar- 
ies in text-to-speech systems). Finally, due to the 
lack of explicit inflectional information for nouns 
and adjectives, only the non-inflected forms of the 
entries were extracted during dictionary lookup. 
Sirnilarly for verbs, only the infinil;iwd forms were 
used since the dictionary does not list the intleeted 
forms as headwords. A program was written to 
search through the dictionary pronunciation field 
and select the longest word where the phonenm 
pairs would be in mid-syllable position in order to 
avoid the extraction of' phonemes occuring at the 
beginning or end of words. In this way, l, he influ- 
ence of lexicM stress was reduced. The orthog- 
raphy/prommciation pair \[headword_orth, head- 
wordq~)hon\] was extracted and headword_orth was 
placed in a carrier sentence for recording. Out 
of 1225 original phonemic pairs, 874 words wet'{'. 
found with at least one occurence of the pair. Be 
cause 1225 is the number of all phonemic pairs in 
French whether they are allowed or not, it is inter- 
esting to notice that only 874 pairs occur within 
real words in the Robert dictionary. 
2.1.3 Selection of logatomes 
For the logato.tes, two phonen,es /a/ attd /t/ were 
used to encompass the selected diphone, since they 
appear to be fairly stable from a phonetic-acousLic 
standpoint. In order to balance the alternation of 
vowel and consonant, the words were constructed 
as follows: 
Logatome position 
initial vow. 
initial cons. 
final vow. 
final cons. 
COILS. VOW. 
VOWel (;ons. 
cons.1 cons.2 
vow.1 vow.2 
Structure Example 
*V-ta ota 
*C-ata bata 
at-V* at() 
ta-C* Lab 
ata-CV-ta atabota 
at-VC-aba atit}ata 
ata-C,C-ata at.akrata 
at-VV-ta atoata 
"Fable 2: Phonotactic structure of logatomes 
All strings were generated in this way, ewm if 
they were not phonotaeticMly well-formed for iso- 
lated words in the language. Nonetheless, these 
R)rms were generated and used since they were 
necessary for interword phenomena. Approxi- 
mately 1225 words were constructed following the 
ab ow~' model. 
978 
l{.esearchers disagree as to whether to use 
logatomes or real words for synthesis. The ar- 
gmnent for using logatomcs is that it is t)etter 
to collect non-real words so that the diphone ix 
recorded as neutrally as possible and does uot un- 
dergo any real word stress. Those against argue 
thai; the (\[iphone is ov(~r-articulated in a logatome 
environment and that it reduces l, he naturahmss of 
the synthesized sl)eech. The choice is more corn 
plex in the sense that it greatly depends on the 
speaker, the articulation, and the comfort in read 
ing the two diff('.rent sets. Given the controversy, 
in the present system, we decided to record the 
l)houemie t)airs in bot;h environments, so thai we 
(:ould choose the best ones, 
2.2 The other polyphonic units 
I)ue to the variability of liquids and semi-vowels, 
synthesis based only on (liphones will uot give 
good results. Indeed, such systems have provcu 
to be insut\[icient. Researchers \[?\] argue l;ha| di- 
\])holle COllcat(?ll~ttion alolle is llOt a(l(2(ltlate or sl/f- 
ticient, particularly for complex transitions. \[?\] 
claims that "Meal diphones with perfect (;oncatc- 
nation would giw~ imperf(~ct results". Complex 
polypho,~es are not equivalent to concatenated all- 
phones. Therefore, louger concatcnativ(~ units are 
necessary. Polyphones are defined by \['/\] as be- 
ing a segmental unit where the initial and linal 
phoneme are not subject to variability, thus, ex- 
cluding liquids and semi-vowels. 
The strategy chosen in the Fre.nch system re- 
lies on some phonetic ge.neralities to build a set of 
tril)honcs. It was decided a to form a (:lass of tri- 
phoues, based on the following transition: I'VC'~ , 
where 1 ) is a phoneme, V a vowel, and Cc a conso 
naut rel)resenl,ative of the ~trticulatory locations, 
i.e. one velar, one dental, and one nasal. The 
set consisted then of 35 phones x 14 vowels x 3 
consonants = t47() triphon(:s. The same method- 
ology used for building the set of (liptlones was 
used for the triphon(~'s. These were inchMe(I in a 
carrier word for the logatomes and extracted from 
the dictionary for the real words. 
Researchers disagree on which criteria are best 
for the selection of triphones; should the selectiou 
rely on phonectic-a<:oustic <widence, or on statisti- 
cal evidence, related to tl,e fi'equency of occurrence 
of triphones in the language? Then, once the (:rite- 
ria is defined, which triphones shouhl be selected? 
Can candidates of a class (say the phoneme /p/ 
3 personal communication with Joe Olive 
representing all the stops, the phoneme./v/ repre- 
seutmg all the fficatiw~s) be picked to rel)resent a 
class or sit(mid all the phonemes belonging to the 
class he sekwted? Resenreh is underway in this 
a,~a using a phone,,, clustering approach \[':\], \[':\] 
that allows the sehx:tion of segnwaltal units fi'om a 
database of I)honemes containing several instances 
of the same phoneme. Tim extraction is made at 
a spectral point common to the pho,wmes. Fi- 
nally, he.cause the nnml)er of selected units atfects 
results, the choi('e of polyphones must be Ilia(h! 
with care. 'l'aking illto accotlrlt the size limita 
lion, one has to balanc(~ out the choice of the poly 
phones considering its frequency in ~he. language. 
This brings in the additional complexity of cort)us 
selection (its language properties, dialects, socio 
linguisl,ic tyl)e of language, topic, and size). 
\[?\] applies a series of rules on phoneum coln- 
bination to exclude inter-word concatenation that 
would not occur in French. For example, one can- 
not lind a glide, in I'~rench that ix not in the left 
or right cont;ext of a vowel; therefore, the combi- 
nation consonant-glide-consonant is excluded. An 
optimal set of polyphone combinations is com- 
puted that re.aches a tmmber of 7725 units. Calcu- 
lated from texts, statistics are then run on these 
illlits to (teterlllille the most freqllellt oc(;iH'elH;es 
in French, and the numbex of units is lowered to 
3000, It remains to be seen whether this al)proacll 
is successfidl iu a workiug system. 
2.3 Construction of the corpus 
A carrier sentence "C'est CAI~I~ItgI~_WOnl) qlle JC 
dis" was selected to fulfill the following require 
i'qeill.s: 
• short sentelice~ to record, 
• ability t() surrourid the' carrier word to avoid 
selfl, ential accent and effects, 
• phonetically neutral environment. 
2.4 Choice of a Speaker 
l?ive male natiw: speakers of Continental French 
were interviewed for selc'cting tile voice of the 
lq'eneh synthesizer. A sample of text represent- 
ing highly o('(;uring graphemic trigrams wax pre- 
pare.d to be used in this task. The corpus wax 
run through a greedy algorithm 4 that returned 
the most frequent words within their sentences 
4'|'hanks to .Inn Van Santen for developing and running 
his greedy algorithm. 
979 
along with a measure corresponding to the cov- 
erage of the graphemic triphone. Once tile sample 
was recorded by tire 5 speakers, the natural voices 
were run through LPC analysis and re-synthesize.d 
in order to judge the resistance of tile voice to 
synthesis. Five subjects were asked to give their 
judgcrnent on the following criteria: 
clear articulation: tile voice was carefully 
listened to evaluate tire articulation of the 
speaker. Subjective perceptual judgements 
were lnade. 
2. neutral French accent: the candidate was 
asked about tile areas of Franc(: where he 
grew up. The central area of France "l'Ile de 
France" is known for its neutral accent and is 
regarded as being a well-received accent. Ad- 
ditionally, for French native speakers resid- 
ing in the USA, particular attention was paid 
to the influence of English in tire prommcia- 
lion of French, especially for English borrow- 
ings, such as for example, the company name 
AT&T to be pronounced/a te re/(the French 
way) and not; /el t n t/ as in English. 
regularity: special attention was given to en- 
sure that the speaker would have a reason- 
able degree of regularity in uttering French 
phonemes. 
ph:asantness of the voic(.': the subjects doing 
the evaluation were asked to give their opinion 
on the pleasantness of the voice, in particular 
the timber, the level of nasality, and the into- 
nation. Of course, this is a highly subjective 
matter but a critical one for success. 
2.5 Recording Conditions 
The recording was done on four non-consecutive 
days under the following conditions. Thc sen 
tences were recorded directly onto the com- 
puter through a 1)AT (Digital audio 'rape) tape 
recorder, using interactive software allowing easy 
reading and repetition of the sentences Lo be 
recorded. Additional time was devoted to the 
recording of triphones as well as the re-recording of 
sentences that were improperly uttered. The same 
carrier sentence and a regular prosodic context 
was carefully maintained so that there was mini- 
real suprasegmental variation. Once the recording 
was done, the 48 kHz digitized acoustic signal was 
downsized to 12 kllz. 
2.6 Transcription of recording lna- 
terial 
For the recording, all sentences were transcribed 
from the phonetic alphabet to an orthographic 
Ibrrnat. This was done to allow tile speaker 
to utter sent(;nees with more naturalness. Once 
the recording was dorlc'~ th(" sentences were setni- 
automatically re-transcribed into phonc%c form. 
For some~ utterances, the phon('tic transcription 
was manually adjusted to the idiosynerasi(;s of the 
speaker. For example, it often happened that 
confusion arises between open and closed vow- 
els, such in the ~ord '~zoologique" zoological that 
can be pronomtced either/zooloaik/or/zaalosik/. 
In case the output was /zooloaik/ instead of the 
expe(%ed /zaalosik/, the transcription was read- 
justed. 
2.7 Segmentation 
Segmentation is presently in progress; efforts are 
being pursued to adapt an automatic segmentor 
for English to French and other languages. In the 
meantime, rnannal segmentation is being done as 
a pilot experiment in order to cheek the accuracy 
of automatic segmentation. Beyond the scope of 
this paper are many complex issues raised in seg- 
menting French, such as the segmentation of semi- 
voweds (/j/, /w/, and /q/) and liquids (/l/ and 
/r/), each of these phonemes being quite unstable 
f¥om a phonetic-~eoustic standpoint. These issues 
will be addressed in hmm; work. 
2.8 Integration of an orthographic 
transcriber 
A grapheme-to-phoneme transcriber \[?\] was ac- 
quired to convert French orthography to a phone.- 
mie representation. The software performs some 
syntactic and partial semantic analysis of the sen- 
tence in order to disambiguate the input string. 
Once performed, spellings at0. converted in a se- 
ries of steps into a phonernic representation. 
3 Issues in Text Analysis 
We have t)nrsued work in the text analysis of 
French in order to obtain linguistic data for in- 
tonation and prosody; additionally, the output of 
the work will be used in the translation project. 
This aspect of the work has entailed several points: 
• acquisition of a large French dictionary: 
lt.obcrt Encyclopedic dictionary (containing 
980 
over 851¢ ent,rie.s, 80k articles, 160k ei- 
tatious, analogical terms (synonyms, ho- 
n|onlylns, el,(;), and conjugatiou tables for 
illOSt l?rerl(;\[l verl)s), 
• collectiol~ of French corpora: 
i,'rcneh news from LI'; M()NI)I'; \[?\] 
I/'retlch news daily <:ompih'.d by the 
French embassy in Washington DC 
(24657K byt;es arc now en<:oded, and a 
monthly update is being done.). Tim 
data are in ascii and aeeeltts were re= 
stored using oue of the features of the 
gral>hetue:l,o: pholmlne software. An 
other \[)rogl?aliq was writl, en to alltOtll&t 
ieally cletm aim norntalize these e-mail 
\[or|nat d ;-d.a. 
extraction of some of the, I.Lot>crt di(> 
tionary datal>as<~s: the 160,000 citations 
tTrom literary Fren<h authors are being 
extracted so thai they cau constitut(~ 
8Ollle I'(?\](;V~-tl|t; tort)us data. A fl:alllOWOrk 
is being worked out so that cital, ion au-- 
thor can I)e retrhwed ou an optional I)a 
SiS. 
• en(:oding of French data using the. ah'eady ('x- 
\[sting sch<'=n|c de.velof)ed I)y \['?\] and enhanee<t 
by \[?\]. This sche.me= allows the use of the con= 
cordal~ce program. As \[,\]nglish data are en- 
coded in 7 bit characters, au 8 I>it encoding 
format was worked out to allow the retriewd 
of French text with accents 5 For exmnl>le , 
{fie Hllaccellte(l word Xcot(?" ill Freuch can be 
several words: "(;ot('\]' with llo ~4c(;(?llt Hl(?all- 
ing quotation, rating, "c6te" meaning coast, 
aud "<'6t6" meaniug .sidt all these transla 
{ions I>eing also valid in the figurative s<',t~se. 
Thus, a latinl compatible window wouhl dis 
play lg'eneh corpora with accents; in the fol- 
lowing examph:, the l>rogram returns all in= 
tall(:es of the word "(;ore" ((l/iotatioll, ratit~g) 
in the database "l,e Mend{". The <luery to 
the, syste.m will retric'w~ all the l,'rench sen- 
l, ellces where the exact Hlal,('h to th(! charac- 
ters "<:ore" will o(;eltr, and neither of the other 
st)tiling: 
The query producing table ?? returned in= 
formatiou of "1(' ( ' ~. Men l only, as requested. 
\[n specifying "FRI';N" for Fren<:.h, the follow- 
ing query in Table '?'." returns all install(:es of' 
51 am w:ry grateful to l)avid Yarowsky for m.:oding the 
\[*¥1!I Ich data. 
T,,tah 
9 : ,:,,tu 
MONIDE ~7737: inltond,r~ble. I,a c,.te changcait d' hculc 
Mf)NI)E 1 t,~656: d,)pause l~ cole 2 1111o 
MONI)Iq 33162~H att-,lessus (le la c¢,le 3D~\], et 1~ 
MONI)I'; ~1:I5288: ({Qll\]¢lltfellSC, ,'ga cllie de l)llplllglit~ 
MON\[31,} (;88281: ~lols que la cote <1, VM,~ry (ilsc~r,l 
MONDF) 7{11355: par nnc ,:ot~ de poplll~itg. 
Table 3: Some concordances of the word "cot<'" in 
the databas(, "l,e Monde" 
the word "cote" ill the thre.e Fr(mch cor\[)ora. 
Moreover, the " i" option allows the retrieval 
of all instances of a word with or without ac-. 
cent, therefore {,he three l/'rench words "cote", 
"trite", and"c6t6". For more information on 
tlw use of the concordance tools, refer to \[?\], 
M^tch: ,:,,t(, 
'l'l,t al: 9 
9 : ,,,,re 
MONDH 2673: pied ~ul 1^ e6le qu' \[Is serMcllt 
MONDI; 3835: i,r~vu : l& 1:611~ dll C:Mvados. 
MONDE ,ID811 de (cux de la 1:61e :~u,4 de 
MONI)Iq 41MIII~: unlvc;~it&. ,le i~ ,:file .xtl~ntlque. 
AI"P 257O: 8avel~t. (:6t6 t r~vailliste 
AI"P .131;,16 : plait Shalllll, (i gt i~ \[stYli, T~ 
AI,'P 53874; , n:Au~nis, eli (1¢%t6 d' Oi 
At+l" 12679,1: { 'alne/nun , v~6t ~ {t I \[vail,:, 
Al'l' 181788: s/.c it tlt g " Q16 ( I~ liban ais 
AI,'p 1881O1 : gn. iii " (:~lt ~ \[ialtc,&~8 
IIANNI" 26738~;: £ lnettr, dc ~:t%t~ I' antlp~l\]ti+: 
IIANS;\[.' 271932: tl~ s,:nMblc ,\[u e61d des tninlstbri,Hs 
I\[ANSI,' 272137: tie 1' ,llltlc Cgtd dr! lh 111, , 
IIANEI" 27h,5011: dc I' autle c6l~ de la {:halllblt 
\]IANSI" 276522: arlitrc-b~llU dl\[ 1:\[~1 ~ d ~1 ~lte\]lI 
Tabl<~ 4: Some <:oneor<tances of the 
all \['~r{mch databases 
word "cote" ill 
• development of a morphoh)gical analyzer and 
generator for French, using finite-s~at, e trans 
ducer: the system is built with art approach 
similar to the on{ fbr Spanish \[?\]; it, is mainly 
base<\] on th<, headwords of the Robert dictio: 
nary. 
• ~,c(:ent filters: conw~rsion tables are still being 
produced at ea(;h tim<=' a new datat>asc arrives 
that, is not in a compatible form. 
4 Conclusion 
The French TTS system is part of a large project 
of multiliugua\] text-to-sl>eech synthesis ill progress 
at AT£'T Bell I~ahoratories. Speech synthesis 
for French brings a variety of chalh!nges, some of 
which are specific to French, such as nasalization, 
liaison, schwa realization, etc. and some of which 
are more general issues, such as vowel hmgthen= 
ing, prosodic cont, ouring, and intonation. Several 
systems are. in exp<!rin|e|d;al stages for other lan- 
guages, such as Spanish (Castilian as well as South 
Ame.ri<'ar0, Italian, Chinese, Navajo, German, aud 
Russian. Once C, ontinental French is eomph'.ted, 
981 
we also intend to build a TTS system for Canadian 
French. 
References 
\[1\] Veronique Aul)ergd. La synth~se de la parole: 
'des r@les aux lezique'. I'M) thesis, (lniver- 
sitd de Grenoble, Grenoble, France, 1991. 
\[2\] Jared 11ernstein. Speech synthesis: System 
design and applications, pages 39 42. Na- 
tional Computer Conference, 1987. 
\[3\] Fr~,ddric Bimbot. Synth~se de la Parole: Des 
Segments aux r@les, avec utilisalion de la 
ddcomposition lemporelle. PhD thesis, Tele- 
corn Paris 88 EOl9, Paris, France, 1988. 
\[4\] M. Chafeouloff. Les propridtds acoustiques 
de \[j, y, 1, r\] en franeais, volume 6, pages 
10 24. Travaux de l'Institut de Phondtique 
d'Aix, 1979. 
\[5\] N.R. Dixon and H. D. Maxey. Terminal arm- 
log synthesis of continuous st,eech using the 
diphone method of segment assembly. In 
\[EEE 7)'ansactions on Audio and Electroa- 
cousties A U16, pages 40 50, 1968. 
\[6\] Fxancoise Emerard. Synth~se par diphones el 
traitement de la prosodic. PhD thesis, Univer- 
sitd de Grenoble Ill, Grenoble, France, 1977. 
\[7\] Alain Duval et al. Robert Encyclopedic Dic- 
tionary (CD-ROM). Hachette, Paris, 1992. 
\[8\] Judith Klavans and Evelyn(; Tzoukermann. 
The use of machine-readable dictionaries in 
text to speech, under review, 1994. 
\[9\] P. Laferrib.re, G. Chollet, L. Miclet, and J.P. 
Tubaeh. Segmentation d'une base de donndes 
de 'polyson', application 5, la synth;;se de pa- 
role. In JEP: 1~, pages t07 .110, 1985. 
\[10\] L'histoire au jour le jour 194:4-1991. In ver- 
sion 199~. CD-ROM, 1992. 
\[11\[ F, Marty. Trois systbmes informatiques de 
transcription phondtique et graphdmique. Le 
Franeais Moderne, LX, 2:179 197, 1992, 
\[12\[ L, Miclet. Enregistrernent d'une base 
de donndes vocales. In LAA/TSS/RCP 
E.N.S.T. CNET, 1984. 
\[13\] Joe P, ()live. Rule synthesis of speech fi'om 
dyadic units. In Proceedings of the IEEE- 
ICAS"SP, pages 569 570, 1977. 
\[14\] Joe P. Olive. A new Mgorithm for a eoncate- 
native speech synthesis system using an aug- 
mented acoustic inventory of speech sounds. 
In Gdrard Bailly arid Christian Benoit, edi- 
tors, Proceedings of the ESCA Work:shop on 
Speech Synthesis, 1990. 
\[15\] Joe P. ()live and Mark Y. Liherman. A set of 
concatenative ,,nits for speech synthesis. In 
In J. ,I. Wolf and I). t1. Klatt, editors, Speech 
Communication Papers Presenled at the 97th. 
Meeting of the Acoustical Society of America, 
pages 515 518, New York: American Insti- 
tute of Physics, 1!)79. 
\[16\] G. E. Peterson, W.S.Y. Wang, and E. Siver- 
stem Segmentations techniques in speech syn- 
thesis. Journal of the Acoustical Society of 
America, 30:8:739 749, 1958. 
\[17\] Scott Rosenherg and Joseph 1'. Olive. Ex- 
pediting the selection and clipping of nmlti- 
phone sequences for acoustic inventory. 
In 11222-930830-I2TM, Murray Hill, N,a., 
USA, 1993. Technical Memorandum, AT& 
Bell Laboratories. 
\[18\] ,lira Rowley. Phoneme clustering tools. 
In 11222-931123-25TM, Murray flill, N.J., 
USA, 1993. Technical Memorandum, AT& 
Bell Laboratories. 
\[19\] R. Schwartz, J. Klovst, ad, J. Makhoul, 
D. Klatt, and V. Zue. l)iphone synthesis 
for phonetic vocoding. In Proceedings of the 
LEI3E-ICASSP, page 891, 1979. 
\[20\] Evelyne Tzoukermann and Mark Y. Liber- 
man. A finite-state morphological proces- 
sor for spanish. In Proceedings of Colin990 , 
Helsinki, Finland, 1990. International Con\['er- 
(.'tic(; on Computational Linguistics. 
\[21\] Church Kenneth W. Concordancc's for par- 
allel text. Oxford, England, 1991. Seventh 
Annual Conference of the UW Centre for the. 
New ()El) and Text Research. 
\[22\] David Yarowsky. Cone: Tools for text cor 
pora. In 1.1222-921222-29TM, Murray ltill, 
N.J., USA, t992. Technical Memorandum, 
AT& Bell Laboratories. 
982 
