File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-2157_metho.xml

Size: 28,621 bytes

Last Modified: 2025-10-06 14:13:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2157">
  <Title>ISSUES IN TEXT-TO-SPEECH FOR FRENCH</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ISSUES IN TEXT-TO-SPEECH FOR FRENCH
Evelyne Tzoukermann
AT&amp;T Bell Laboratories
600 Mountain Avenue, Murray tlill, N.J. 07974
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper reports the progress of the French text-to-speech system being developed at AT&amp;T Bell Laboratories as part of a larger project for multilingual text-to-speech systems, including languages such as Spanish, Italian, German, Russian, and Chinese. These systems, based on diphone and triphone concatenation, follow the general framework of the Bell Laboratories English TTS system \[?\], \[?\]. This paper provides a description of the approach, the current status of the French text-to-speech project, and some problems particular to French.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="979" type="metho">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In this paper, the new French text-to-sIieech system being developed at AT&amp;T is presented; several steps have been already achieved while others are still in progress. First we present a brief' description of the phonetic inventory of French, with a discussion of the approach used to select and segment phonetic units for the system. Methods for automatic segmentation, and for the choice of diphone and triphone units are presented. Some comments on durational and prosodic issues follow. We conclude with some discnssions on directions for fllture improw.'ment, including morphological analysis, part-of-speech tagging, and partial phrasal analysis for the purpose of phrasal grouping.</Paragraph>
    <Section position="1" start_page="0" end_page="977" type="sub_section">
      <SectionTitle>
Phonetic Description of
French
</SectionTitle>
      <Paragraph position="0"> The French phonetic system consists of 36 phonemes, including 17 consonants, 16 vowels, and 3 semi-vowels. Table 1 shows the different phonemes; the IPA column contains the phonemes in the standard International Phonetic Alphabi,t; the second column ASCII shows the ascii correspondence of these characters for the text-to-speech system, and the third column shows art example of the phoneme in a French word.</Paragraph>
      <Paragraph position="1">  For the French text-to-speech synthesis system we use 35 phonemes, consisting of 17 consonants, 15 vowels (and not 1{3 like in the n,a cohlmn), and 3 semi-vowels. As shown in Table 1, the fourth nasal /de/ has been removed, /07,/ and /g/ being represented by the single phoneme /g/. The reasons for this change are that (1) /de/ tends to be assimilated to the phoneme/g/, and (2) this nasal vowel occurs in very few words in French. Thus,  iC eould be said thai, functionally the disi, inel, ion I)(;Cwoon i&lt; 'l and is ininiinal. Prcneti also COil-. rains two \])holiOlilOS for 1,he eharaetor &amp;quot;a&amp;quot;, /al and /q/ , the first ouo hoing a front unrounded vowel and the second one abael( romidod vowel. A small iillliiboi: of l,'r&lt;;n{:h spcal&lt;crs lli;\[ko this I)roduetion ai&lt;l&lt;l i)&lt;;i:C&lt;~l)Cu~-d disl, hietion; in addiiAon, Coday's tendency shows a dis;-q)t&gt;caraii&lt;:o of 1,his I)honeniie disc.l, hieCion. Therefore, ouly /a/, the IliOsL {'orii-. liiOll t&gt;holiellle of the Iwo&gt; was roCahiod for s/nthc sis. NoCiee thaC I, wo dilfcronC &amp;quot;sehwas&amp;quot; (or lilllto I;~)&gt; ,,la,.kod as It+/and /A/wc,:o retaino{l for synl, hc sis; sin&lt;'.&lt;e sehwa in spokeil l&amp;quot;rcneh ca, it t)~, iu SOlliO crises, prosollC or not dcpondiug 011 i, hc level of fornlality of i&amp;ilgu&amp;ge it is iisot'ul Co ll~-wo Owe dilfo=renC signs Co aeeounl for I.tiis option, l,l addition, Cho graphcnio-I;o:pholieirie systelil IlSOd ill the Vronch TTS sysColn and dose.ribod hi SocCion ??, is o=quipp&lt;;d wiCh the Cal&gt;at)ility of ineh.ling or .ot I, Ilc schwa &lt;lol&gt;on&lt;ling on the lc'w'.l ot' language. For ex~-~inplo&gt; Clio sonl;onco &amp;quot;jo Ill'Oil V~-tis s;uncdi&amp;quot;, I ai'i'l h:auiny on saturday, (;21.II lie said &lt;tither/3,) lll('l Vg samc)&lt;li/ or, liioro eolloquially, /:,;mh ve samdi/ , dot)ondiug on whether the schwa ix reduced or noC.</Paragraph>
      <Paragraph position="2"> In olir systoiii, l;ho solil,(;llCO will t&gt;o I, ra, nseribcd t:+/x HI('I Vg sanlAdi/, A ;-t(;eOlllitiilg; for the Cl';tce of the schwa. An ad&lt;liCioual eilaraccer &amp;quot;*&amp;quot;, was uso&lt;l to r{qpro.sent silences aC the Iwginlling and end of WOl'(ts, Ig'onch Idlouenies (:au also he viewed ac&lt;:ord itig t(&gt; their Sl)OeCfal variabilil.y iu the eont,oxC of oCher i&gt;honoliiOS, li, is knowil thaC l,'ronch vowels show spectral stability ;MIll low c()llt(~XCllltl vari ahility \[?\], \[?\]. '1't1{; voiceless f,.icaCivos show somewiled; less spoeCra.l sCal&gt;iliCy, I;tioai Chc plosives. The nasals and voieod fricatives present ow!n less sCa= hility. Ifi&lt;luids l/l/ and/r/)and semi vowels l/j/, /w/, /q/) arc the i&gt;ho,~omcs showing high variahiliCy a,n(l this poses prot&gt;hmis in diphono hasod synl;hosis \[?\]. Liquids ai'o very scrisiCive Co lh&lt;='ir eolltoxC; forinaAiC strllei.;tlres show subsCanl, ial cf fects of c.oart, icuhd.ioti. As for the s&lt;;mi-vowols, il. is ditliculC I.o ot~t)Clll'O Che ZOllO of spec.tral stability. For those' reasons, some researchers, o.g. \[?\], orgauizc l)iionernie classi(i&lt;:ation using Che crit&lt;;ria of the stable vs unsi;ablc phone.me raChor than place of arCieulation. Sinii\]ar to Clio approach in Ill&lt;'. l'\]nglish TTS sysCerii, syi'lthesis for French is (tolie using f&gt;restorc'&lt;\] liilil,s. Within this frainc work, there are various stralo&lt;~gies for 1,he colh'.o-l, ion of uniCs, units i,hat will then eonsl, iCui;e the dicl, ionary of polyphonos. 1)lie to Chc eoil{inuo.\[ a.spe&lt;:l; of the speech signal and tile fact chaC the lt&amp;Cllro of l)honenies is greatly modified in the eouix~xl of ol hor phon&lt;mlc's, SylllrhOSiZiIlg separat.~ pIIOliONIOS ea.niioC (:a.pCllro ;trticllla.Cory aSl&gt;ocCs of the languag(!. Ad(lil, ioually, transitions are harder Co modo.I I,han steady staCo.s. Thus, diphones are l&amp;c standard minimal uniCs in segmental synCho sis. Froln an acoustic stan(IpoinC, a diphollo (;ram be seen as a signal passing from/,he co.Cral parC of ~ !c)holmm('. Co the central pare of the sut&gt;soquelHi ph&lt;mcmo; iu oth&lt;~r words, it is a unit oonllmSed of Cwo half phonemo.s. At a sogmo.nt,al low:l, one cau Chink of a diphone as a sCored length el'st&gt;etch ChaC goes fi:om nt'm: the target of one phonelne {tilt\[ cxCen&lt;ls Co near I.he t;-trg~.'C of Cho followiug one, ia ocher word l.ho CransiCion \[?\].</Paragraph>
      <Paragraph position="3"> 'l'h&lt;~ earliest diphono, systcln was &lt;loscrihed hy I'oCcrson oC al \[?\]; ocher &lt;liphono apl&gt;roa&lt;:hes have.</Paragraph>
      <Paragraph position="4"> been roi&gt;orC&lt;xl by \[?\], \[?\], \[?\], an&lt;t \[?\]. AlChough there are only about 40 phonemo.s in/&amp;quot;nglish al)out 1600 diphonos sulfieo= for synthesis. Nev&lt;;rthe less) b('.eaaise of lllllNerOLlS allophono.s and the face that some dil&gt;hones are not really conCexC floe, re' searchers like I'ctcrsou suggesl, that, aboul. 8000 &lt;tiphoHes are nce&lt;t&lt;xl for high quality &lt;liphone syn.thesis. Moreover&gt; the vowel diphtongs in gnglistl could be trcato.d as peudo-diphones, l,'arly Iq'cneh synthesis systems \[?\] relied also on sym, hesis by diphouos exc&lt;'pt for the. diphone \[qi\] that is into gratc'{l in a Cril)honi&lt;: group. This phonemic pair was sCore&lt;l diff&lt;,rontly hoeauso of its high fr&lt;!qu&lt;mcy iu lg'onch in oe&lt;:urrcnces such as &amp;quot;hii&amp;quot; him~her. In lnoro recent work, systelliS (;olltaiil diphonos and larger units, such as Cril&gt;hones , quadriphonos&gt; and evol, q,,intophonos \[?\] \[?\], iu order to capture eoarticu\[a.tory lihononio.na of a longer domain that would iloli be adequately irio&lt;l&lt;'.lcd in a stric.tly di\]&gt;honic system.</Paragraph>
      <Paragraph position="5"> lu the current sysCem, the dil&gt;hone invcutory for lq'ench was built by taking 35 ~ phonernic pairs, Chat is 1225 ilnits. Ad&lt;lod Co that was Clio silence symbol in initial and final position, which adds a, lioChor 70 phoneniic \[)aii:s, \[gl'OIH this iniCial sol;, l, he pairs of se.lni-vowels wcrc relnow;d. All the ottior &lt;x)mt)inations were kept. Even though all of th('.Ill do llOt oecllr ill French lexical strueCure, they &lt;:a. still app&lt;!ar in tile intcr-wor&lt;l boundaries. For oxaml&gt;lc , the sequence /lr/ is not permiCted word internally, but imist be handled since it appears in the interwor&lt;l assimilation in /val r.jc/ &amp;quot;valenC rion&amp;quot; cost 'n, othiny. This is partieularly iinportant in French sin&lt;:e inter-word liaison is comnion as in /el z 5/ &amp;quot;ell&lt;;s ont&amp;quot; they have vs /el s5/ &amp;quot;olios sont&amp;quot; they are, whero the final consonant/s/eithor undergo0s liaison wiC\]l the vowo,1 /5/ rosulting in /z/, or undergoes linking with the consonani, ts/  resulting in the devoiced sibilant.</Paragraph>
    </Section>
    <Section position="2" start_page="977" end_page="978" type="sub_section">
      <SectionTitle>
2.1 Diphone Structure and Selec-
</SectionTitle>
      <Paragraph position="0"> tion of Carrier Word  This section discusses the nature, of the diphone set and the manner in which diphones were collected. Diphones are structured as \[bllows: *V, *C, g*, C*, CV, VC, CO, gV where * is a silence, C a consonant, and V a vowel. Semi vowels were treated in the same fashion as consonants. Diphones were recorded following two (lif\[erent strategies: the first one consisted of picking existing words from a dictionary list. The second consisted of deciding on a neutral phonetic context in using logatornes or nonexisting words. Logatomes are phonotactiically well-formed strings, which do not exist as words in the current French language.</Paragraph>
      <Paragraph position="1">  machine-readable dictionary A word list was extracted from a subset of the Robert French dictionary \[?\] and the pronunciation fields were extracted. The dictionary contains almost 89,000 entries, of which 85,796 entries con-Lain a headword, a phonemic transcription, and a part of speech. The remaining entries are prefixes and suffixes. The first task consisted of converting and mapping the dictionary phonemic symbols to the ones adopted in our system (shown in table 1). This was not straightforward since there was not always a one-to-one mapping between the two sets. For handling symbol mapping, a program was written that converts any set of characters to any other set of characters I. The program is developed so that characters coded in octal or decimal code not only can he translated in either code, but also can be input in ascii format for being converted 2 Quite often, there was more than one pronunciation in the phonetic field and the. pattern matching program chose the pronunciation corresponding to the one required. Moreover, dictionary pro11 am very grateflfl to Mike Tanenblatt who wrote this program and made a succession of changes until complete flexibility of character conversion was obtained.</Paragraph>
      <Paragraph position="2"> 2 This tool allowed the conversion of databases originally written on Macintosh, PC, or Unix. Additionally, we used it to convert all the French textual databases into latin1 8 bit encoding format.</Paragraph>
      <Paragraph position="3"> nouneiation fields are often not phonetically linegrained enough for acceptable speech output (see \[?\] for a discussion on machine-readable dictionaries in text-to-speech systems). Finally, due to the lack of explicit inflectional information for nouns and adjectives, only the non-inflected forms of the entries were extracted during dictionary lookup.</Paragraph>
      <Paragraph position="4"> Sirnilarly for verbs, only the infinil;iwd forms were used since the dictionary does not list the intleeted forms as headwords. A program was written to search through the dictionary pronunciation field and select the longest word where the phonenm pairs would be in mid-syllable position in order to avoid the extraction of' phonemes occuring at the beginning or end of words. In this way, l, he influence of lexicM stress was reduced. The orthography/prommciation pair \[headword_orth, headwordq~)hon\] was extracted and headword_orth was placed in a carrier sentence for recording. Out of 1225 original phonemic pairs, 874 words wet'{'.</Paragraph>
      <Paragraph position="5"> found with at least one occurence of the pair. Be cause 1225 is the number of all phonemic pairs in French whether they are allowed or not, it is interesting to notice that only 874 pairs occur within real words in the Robert dictionary.</Paragraph>
      <Paragraph position="6">  For the logato.tes, two phonen,es /a/ attd /t/ were used to encompass the selected diphone, since they appear to be fairly stable from a phonetic-acousLic standpoint. In order to balance the alternation of vowel and consonant, the words were constructed as follows:  All strings were generated in this way, ewm if they were not phonotaeticMly well-formed for isolated words in the language. Nonetheless, these R)rms were generated and used since they were necessary for interword phenomena. Approximately 1225 words were constructed following the ab ow~' model.</Paragraph>
      <Paragraph position="7">  l{.esearchers disagree as to whether to use logatomes or real words for synthesis. The argmnent for using logatomcs is that it is t)etter to collect non-real words so that the diphone ix recorded as neutrally as possible and does uot undergo any real word stress. Those against argue thai; the (\[iphone is ov(~r-articulated in a logatome environment and that it reduces l, he naturahmss of the synthesized sl)eech. The choice is more corn plex in the sense that it greatly depends on the speaker, the articulation, and the comfort in read ing the two diff('.rent sets. Given the controversy, in the present system, we decided to record the l)houemie t)airs in bot;h environments, so thai we (:ould choose the best ones,</Paragraph>
    </Section>
    <Section position="3" start_page="978" end_page="978" type="sub_section">
      <SectionTitle>
2.2 The other polyphonic units
</SectionTitle>
      <Paragraph position="0"> I)ue to the variability of liquids and semi-vowels, synthesis based only on (liphones will uot give good results. Indeed, such systems have provcu to be insut\[icient. Researchers \[?\] argue l;ha |di\])holle COllcat(?ll~ttion alolle is llOt a(l(2(ltlate or sl/fticient, particularly for complex transitions. \[?\] claims that &amp;quot;Meal diphones with perfect (;oncatcnation would giw~ imperf(~ct results&amp;quot;. Complex polypho,~es are not equivalent to concatenated allphones. Therefore, louger concatcnativ(~ units are necessary. Polyphones are defined by \['/\] as being a segmental unit where the initial and linal phoneme are not subject to variability, thus, excluding liquids and semi-vowels.</Paragraph>
      <Paragraph position="1"> The strategy chosen in the Fre.nch system relies on some phonetic ge.neralities to build a set of tril)honcs. It was decided a to form a (:lass of triphoues, based on the following transition: I'VC'~ , where 1 ) is a phoneme, V a vowel, and Cc a conso naut rel)resenl,ative of the ~trticulatory locations, i.e. one velar, one dental, and one nasal. The set consisted then of 35 phones x 14 vowels x 3 consonants = t47() triphon(:s. The same methodology used for building the set of (liptlones was used for the triphon(~'s. These were inchMe(I in a carrier word for the logatomes and extracted from the dictionary for the real words.</Paragraph>
      <Paragraph position="2"> Researchers disagree on which criteria are best for the selection of triphones; should the selectiou rely on phonectic-a&lt;:oustic &lt;widence, or on statistical evidence, related to tl,e fi'equency of occurrence of triphones in the language? Then, once the (:riteria is defined, which triphones shouhl be selected? Can candidates of a class (say the phoneme /p/ 3 personal communication with Joe Olive representing all the stops, the phoneme./v/ represeutmg all the fficatiw~s) be picked to rel)resent a class or sit(mid all the phonemes belonging to the class he sekwted? Resenreh is underway in this a,~a using a phone,,, clustering approach \[':\], \[':\] that allows the sehx:tion of segnwaltal units fi'om a database of I)honemes containing several instances of the same phoneme. Tim extraction is made at a spectral point common to the pho,wmes. Finally, he.cause the nnml)er of selected units atfects results, the choi('e of polyphones must be Ilia(h! with care. 'l'aking illto accotlrlt the size limita lion, one has to balanc(~ out the choice of the poly phones considering its frequency in ~he. language.</Paragraph>
      <Paragraph position="3"> This brings in the additional complexity of cort)us selection (its language properties, dialects, socio linguisl,ic tyl)e of language, topic, and size).</Paragraph>
      <Paragraph position="4"> \[?\] applies a series of rules on phoneum colnbination to exclude inter-word concatenation that would not occur in French. For example, one cannot lind a glide, in I'~rench that ix not in the left or right cont;ext of a vowel; therefore, the combination consonant-glide-consonant is excluded. An optimal set of polyphone combinations is computed that re.aches a tmmber of 7725 units. Calculated from texts, statistics are then run on these illlits to (teterlllille the most freqllellt oc(;iH'elH;es in French, and the numbex of units is lowered to 3000, It remains to be seen whether this al)proacll is successfidl iu a workiug system.</Paragraph>
    </Section>
    <Section position="4" start_page="978" end_page="978" type="sub_section">
      <SectionTitle>
2.3 Construction of the corpus
</SectionTitle>
      <Paragraph position="0"> A carrier sentence &amp;quot;C'est CAI~I~ItgI~_WOnl) qlle JC dis&amp;quot; was selected to fulfill the following require i'qeill.s: * short sentelice~ to record, * ability t() surrourid the' carrier word to avoid selfl, ential accent and effects, * phonetically neutral environment.</Paragraph>
    </Section>
    <Section position="5" start_page="978" end_page="979" type="sub_section">
      <SectionTitle>
2.4 Choice of a Speaker
</SectionTitle>
      <Paragraph position="0"> l?ive male natiw: speakers of Continental French were interviewed for selc'cting tile voice of the lq'eneh synthesizer. A sample of text representing highly o('(;uring graphemic trigrams wax prepare.d to be used in this task. The corpus wax run through a greedy algorithm 4 that returned the most frequent words within their sentences 4'|'hanks to .Inn Van Santen for developing and running his greedy algorithm.</Paragraph>
      <Paragraph position="1">  along with a measure corresponding to the coverage of the graphemic triphone. Once tile sample was recorded by tire 5 speakers, the natural voices were run through LPC analysis and re-synthesize.d in order to judge the resistance of tile voice to synthesis. Five subjects were asked to give their judgcrnent on the following criteria: clear articulation: tile voice was carefully listened to evaluate tire articulation of the speaker. Subjective perceptual judgements were lnade.</Paragraph>
      <Paragraph position="2"> 2. neutral French accent: the candidate was asked about tile areas of Franc(: where he grew up. The central area of France &amp;quot;l'Ile de France&amp;quot; is known for its neutral accent and is regarded as being a well-received accent. Additionally, for French native speakers residing in the USA, particular attention was paid to the influence of English in tire prommcialion of French, especially for English borrowings, such as for example, the company name AT&amp;T to be pronounced/a te re/(the French way) and not; /el t n t/ as in English.</Paragraph>
      <Paragraph position="3"> regularity: special attention was given to ensure that the speaker would have a reasonable degree of regularity in uttering French phonemes.</Paragraph>
      <Paragraph position="4"> ph:asantness of the voic(.': the subjects doing the evaluation were asked to give their opinion on the pleasantness of the voice, in particular the timber, the level of nasality, and the intonation. Of course, this is a highly subjective matter but a critical one for success.</Paragraph>
    </Section>
    <Section position="6" start_page="979" end_page="979" type="sub_section">
      <SectionTitle>
2.5 Recording Conditions
</SectionTitle>
      <Paragraph position="0"> The recording was done on four non-consecutive days under the following conditions. Thc sen tences were recorded directly onto the computer through a 1)AT (Digital audio 'rape) tape recorder, using interactive software allowing easy reading and repetition of the sentences Lo be recorded. Additional time was devoted to the recording of triphones as well as the re-recording of sentences that were improperly uttered. The same carrier sentence and a regular prosodic context was carefully maintained so that there was minireal suprasegmental variation. Once the recording was done, the 48 kHz digitized acoustic signal was downsized to 12 kllz.</Paragraph>
    </Section>
    <Section position="7" start_page="979" end_page="979" type="sub_section">
      <SectionTitle>
2.6 Transcription of recording lna-
terial
</SectionTitle>
      <Paragraph position="0"> For the recording, all sentences were transcribed from the phonetic alphabet to an orthographic Ibrrnat. This was done to allow tile speaker to utter sent(;nees with more naturalness. Once the recording was dorlc'~ th(&amp;quot; sentences were setniautomatically re-transcribed into phonc%c form.</Paragraph>
      <Paragraph position="1"> For some~ utterances, the phon('tic transcription was manually adjusted to the idiosynerasi(;s of the speaker. For example, it often happened that confusion arises between open and closed vowels, such in the ~ord '~zoologique&amp;quot; zoological that can be pronomtced either/zooloaik/or/zaalosik/.</Paragraph>
      <Paragraph position="2"> In case the output was /zooloaik/ instead of the expe(%ed /zaalosik/, the transcription was readjusted. null</Paragraph>
    </Section>
    <Section position="8" start_page="979" end_page="979" type="sub_section">
      <SectionTitle>
2.7 Segmentation
</SectionTitle>
      <Paragraph position="0"> Segmentation is presently in progress; efforts are being pursued to adapt an automatic segmentor for English to French and other languages. In the meantime, rnannal segmentation is being done as a pilot experiment in order to cheek the accuracy of automatic segmentation. Beyond the scope of this paper are many complex issues raised in segmenting French, such as the segmentation of semivoweds (/j/, /w/, and /q/) and liquids (/l/ and /r/), each of these phonemes being quite unstable fY=om a phonetic-~eoustic standpoint. These issues will be addressed in hmm; work.</Paragraph>
    </Section>
    <Section position="9" start_page="979" end_page="979" type="sub_section">
      <SectionTitle>
2.8 Integration of an orthographic
</SectionTitle>
      <Paragraph position="0"> transcriber A grapheme-to-phoneme transcriber \[?\] was acquired to convert French orthography to a phone.mie representation. The software performs some syntactic and partial semantic analysis of the sentence in order to disambiguate the input string. Once performed, spellings at0. converted in a series of steps into a phonernic representation.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="979" end_page="980" type="metho">
    <SectionTitle>
3 Issues in Text Analysis
</SectionTitle>
    <Paragraph position="0"> We have t)nrsued work in the text analysis of French in order to obtain linguistic data for intonation and prosody; additionally, the output of the work will be used in the translation project.</Paragraph>
    <Paragraph position="1"> This aspect of the work has entailed several points: * acquisition of a large French dictionary: lt.obcrt Encyclopedic dictionary (containing  over 851C/ ent,rie.s, 80k articles, 160k eitatious, analogical terms (synonyms, hon|onlylns, el,(;), and conjugatiou tables for illOSt l?rerl(;\[l verl)s), * collectiol~ of French corpora: i,'rcneh news from LI'; M()NI)I'; \[?\] I/'retlch news daily &lt;:ompih'.d by the French embassy in Washington DC (24657K byt;es arc now en&lt;:oded, and a monthly update is being done.). Tim data are in ascii and aeeeltts were re= stored using oue of the features of the gral&gt;hetue:l,o: pholmlne software. An other \[)rogl?aliq was writl, en to alltOtll&amp;t ieally cletm aim norntalize these e-mail \[or|nat d ;-d.a.</Paragraph>
    <Paragraph position="2"> extraction of some of the, I.Lot&gt;crt di(&gt; tionary datal&gt;as&lt;~s: the 160,000 citations tTrom literary Fren&lt;h authors are being extracted so thai they cau constitut(~ 8Ollle I'(?\](;V~-tl|t; tort)us data. A fl:alllOWOrk is being worked out so that cital, ion au-thor can I)e retrhwed ou an optional I)a SiS.</Paragraph>
    <Paragraph position="3"> * en(:oding of French data using the. ah'eady ('x\[sting sch&lt;'=n|c de.velof)ed I)y \['?\] and enhanee&lt;t by \[?\]. This sche.me= allows the use of the con= cordal~ce program. As \[,\]nglish data are encoded in 7 bit characters, au 8 I&gt;it encoding format was worked out to allow the retriewd of French text with accents 5 For exmnl&gt;le , {fie Hllaccellte(l word Xcot(?&amp;quot; ill Freuch can be several words: &amp;quot;(;ot('\]' with llo ~4c(;(?llt Hl(?alling quotation, rating, &amp;quot;c6te&amp;quot; meaning coast, aud &amp;quot;&lt;'6t6&amp;quot; meaniug .sidt all these transla {ions I&gt;eing also valid in the figurative s&lt;',t~se. Thus, a latinl compatible window wouhl dis play lg'eneh corpora with accents; in the following examph:, the l&gt;rogram returns all in= tall(:es of the word &amp;quot;(;ore&amp;quot; ((l/iotatioll, ratit~g) in the database &amp;quot;l,e Mend{&amp;quot;. The &lt;luery to the, syste.m will retric'w~ all the l,'rench senl, ellces where the exact Hlal,('h to th(! characters &amp;quot;&lt;:ore&amp;quot; will o(;eltr, and neither of the other st)tiling: The query producing table ?? returned in= formatiou of &amp;quot;1(' ( ' ~. Men l only, as requested. \[n specifying &amp;quot;FRI';N&amp;quot; for Fren&lt;:.h, the following query in Table '?'.&amp;quot; returns all install(:es of' 51 am w:ry grateful to l)avid Yarowsky for m.:oding the  the databas(, &amp;quot;l,e Monde&amp;quot; the word &amp;quot;cote&amp;quot; ill the thre.e Fr(mch cor\[)ora. Moreover, the &amp;quot; i&amp;quot; option allows the retrieval of all instances of a word with or without ac-.</Paragraph>
    <Paragraph position="4"> cent, therefore {,he three l/'rench words &amp;quot;cote&amp;quot;, &amp;quot;trite&amp;quot;, and&amp;quot;c6t6&amp;quot;. For more information on tlw use of the concordance tools, refer to \[?\], M^tch: ,:,,t(,  'l'l,t al: 9 9 : ,,,,re MONDH 2673: pied ~ul 1^ e6le qu' \[Is serMcllt MONDI; 3835: i,r~vu : l&amp; 1:611~ dll C:Mvados.</Paragraph>
    <Paragraph position="5"> MONDE ,ID811 de (cux de la 1:61e :~u,4 de MONI)Iq 41MIII~: unlvc;~it&amp;. ,le i~ ,:file .xtl~ntlque. AI&amp;quot;P 257O: 8avel~t. (:6t6 t r~vailliste AI&amp;quot;P .131;,16 : plait Shalllll, (i gt i~ \[stYli, T~ AI,'P 53874; , n:Au~nis, eli (1C/%t6 d' Oi At+l&amp;quot; 12679,1: { 'alne/nun , v~6t ~ {t I \[vail,:, Al'l' 181788: s/.c it tlt g &amp;quot; Q16 ( I~ liban ais AI,'p 1881O1 : gn. iii &amp;quot; (:~lt ~ \[ialtc,&amp;~8 IIANNI&amp;quot; 26738~;: PS lnettr, dc ~:t%t~ I' antlp~l\]ti+: IIANS;\[.' 271932: tl~ s,:nMblc ,\[u e61d des tninlstbri,Hs I\[ANSI,' 272137: tie 1' ,llltlc Cgtd dr! lh 111, , IIANEI&amp;quot; 27h,5011: dc I' autle c6l~ de la {:halllblt \]IANSI&amp;quot; 276522: arlitrc-b~llU dl\[ 1:\[~1 ~ d ~1 ~lte\]lI Tabl&lt;~ 4: Some &lt;:oneor&lt;tances of the all \['~r{mch databases word &amp;quot;cote&amp;quot; ill  * development of a morphoh)gical analyzer and generator for French, using finite-s~at, e trans ducer: the system is built with art approach similar to the on{ fbr Spanish \[?\]; it, is mainly base&lt;\] on th&lt;, headwords of the Robert dictio: nary.</Paragraph>
    <Paragraph position="6"> * ~,c(:ent filters: conw~rsion tables are still being produced at ea(;h tim&lt;=' a new datat&gt;asc arrives that, is not in a compatible form.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML