File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/c94-2156_intro.xml

Size: 11,762 bytes

Last Modified: 2025-10-06 14:05:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2156">
  <Title>Machine-Readable Dictionaries in Text-to-Speech Systems</Title>
  <Section position="2" start_page="0" end_page="972" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The majority of speech synthesis systems use two techniques: concatenation and formantsynthesis. Building a comprehensive and intelligible concatenative-based speech synthesis system relies heavily on the successfid choice of concatenative units. Our results contribute to the t~sk of developing an eificient and elfective methodology for reducing the potentially large set of concatenlive units to a manageable size, and to chosing the optimal set for recording and storage.</Paragraph>
    <Paragraph position="1"> The paper is aimed primarily at two audiences: one consists of those concerned with research on the automatic use of MR.D data; the other are TTS system designers who require linguistic and lcxicographic resources to improve and streamline system-building. Issues of morphological analysis and generation, as well as stress assigmnent based on dictiona.ry data, are discussed.</Paragraph>
    <Paragraph position="2"> 2 Using MRDs in Text to Speech Several problems are addressed in this paper; one concerns tile subtle complexitics and idiosyncrasies ilwolved iu parsing dictionaries and extracting data. Added to this is the lack of consistency both within the same dictionary and across dictionaries which often requires ad hoc procedures for each- resource. Another issue relates to tile structure of the modules of a TTS system, specifically ill the grapheme-to-phoneme component; dictionary lookup depends on several factors including size, machine power and storage, factors that have important consequences for the extraction ofconcatenative nnits. Another consideration concerns tile nature of the language itself: a language with irregular graphcme4o-phoneme mapping and lexically determined stress assignment (such as English) benefits rnost from the large exception list which a dictionary can provide, There is also the practical issue of dictionary availability, and of pronunciation field accuracy within an available dictionary. Thus, decisions on the use of MRD data depend on many factors, and can significantly impact efficiency and accuracy of a speech system.</Paragraph>
    <Paragraph position="3"> Since a dictionary entry consists of several fields of information, naturally, each will bc userid for different applications \[1\]. Among the standard fields are prommciation, etymology, subjcct field notes, definition fields, synonym and antonym cross references, semantic and syntactic comments, run-on forms, conjugational class and inflectional information where relevant, and translation for the I)ilingual dictionaries. Each of these fields has proven usefifl for different applications, such as for building semantic taxouomies \[3\], \[13\] and machine translation \[12\]. The most directly useflfl for TTS is the pronunciation field \[4\], \[11\]. Equally usefifl for TTS, but less dir6ctly acces- null sible, are data from run-on fields, conjugational class information, and part-of-speech. 1 To illustrate, the following partial entries from Webster's Seventh (W7) \[15\] illustrate typical pronunciation, definition, and run-on fields: (l) ha.yen/'h.~-v0n/ n 1: IIAnBOR, POLO' 2 : a place of safety : ASYLUM haven vt  (2) bi.son/'brs-on, 't&gt;iz-/ n ...</Paragraph>
    <Paragraph position="4"> (3) ho.m,,.ge.neous /-'j~-ne-0s,-ny0s/ ...</Paragraph>
    <Paragraph position="5"> (4) den.tic.u.late/den-'tik-y0-1~t/ or den.tic.u.lat.ed/-,lat-od/ adj  The entry for &amp;quot;haven&amp;quot; contains one fnll pronunciation. The entry for &amp;quot;bison&amp;quot; has one alternative, but the user must figure out that the /on/ should be appended after 't)\]z-/, as in the first prommciation, in order to obtain the correct variation. Correct pronunciation for &amp;quot;homogeneous&amp;quot; relies on the pronunciation of the previous entry, &amp;quot;homogeneity&amp;quot; , and requires the user to separate and bring the prefix &amp;quot;homo-&amp;quot; from one entry to another. To complicate matters, the alternative pronunciation for the suffix /n6.-as/-nyas/ must also be correctly interpreted by the user. Finally, &amp;quot;dentieulate&amp;quot; has a morphologically related run-on form &amp;quot;denticulated&amp;quot; in tile early part of the entry, and the pronunciation of that run-on is related to the main entry, but the user must decide how to strip and append the given syllables.</Paragraph>
    <Paragraph position="6"> 2 While these types of reasoning are not difficult for humans, for whom the dictionary was written, they are quite difficult for programs, and thus are not straighforward to perform automatically.</Paragraph>
    <Section position="1" start_page="971" end_page="972" type="sub_section">
      <SectionTitle>
2.1 Using the MRD pronunciation
</SectionTitle>
      <Paragraph position="0"> field Extracting the prommciation field from an MRD is one of the most obvious uses of a dictionary. Nevertheless, parsing dictionaries in general can be a very complex operation (\[16\]) and even the extraction of one field, such as prommciation, can pose problems. Similar to W7, in the Robert French dictionary \[9\], which contains about 89,000 entries, several pronunciations can be given for a head-word and the choice of one must be made. Moreover, because of the rich morphology of French 1 Notice, however, that the fifll Collins Spanish-English dictionary \[7\], as opposed to the other bilinguals, does not contain any prommciatlon information. Although this is rather surprising taking into account that the smaller versi ...... h as the paperback and g .... (\[8\], \[lO\]) do 1 ..... phonetic field, it could be attributed to the fact that prommciation miles in Spanish are relatively predictable. \[2\] reports on the need to resyllablfy entries already syllabified in LDOCE \[18\], since syllable boundaries for written forms usually reflect hyphenation conventions, rather than phonologically motivated syllabification conventions necessary for pronunciation.</Paragraph>
      <Paragraph position="1"> which has a rough ratio of eight morphologically inflected words for one baseform, Robert lists only the non-inflected forms of the lexical entries. Itowever, if pronunciation varies during inflection of nouns and adjectives, the pronunciation field reflects that variation which makes the information difficult to extract automatically. For example, in  (5) and (6), one needs to know the nature of the rule to apply in order to relate both forms of the adjective.</Paragraph>
      <Paragraph position="2"> (5) blanc, blanche/bl~, blbJ'/adj, et n.</Paragraph>
      <Paragraph position="3"> (6) vif, rive/vif, viv/adj, et n.</Paragraph>
      <Paragraph position="4">  In (5), the masculine/bl~/is obtained by removing the phoneme /J'/ from the feminine /bl~,j'/ (blanche, &amp;quot;white&amp;quot; ). In (6), the form masculine fornr/vif/(&amp;quot;sharp, qnick&amp;quot;) is formed by stripping the affix /ve/ and substituting the phoneme /f/.</Paragraph>
      <Paragraph position="5"> Notice that tile rules are different in nature, the first being a addition/deletion relation, and the second being a substitution.</Paragraph>
      <Paragraph position="6"> In this project, the dictionary pronunciation field was used to start building the phonetic inventory of a speech synthesis system. For the French TTS system \[?\], the set of diphones was established by taking most of the thirty-flve phonemes for French and coupling them with each other (352 = 1225 pairs). Then, the diphones were extracted from the pronunciation field for headwords in the Robert dictionary. A program was written to search through the dictionary phonetic field and select the longest word where the phoneme pairs would be in mid-syllable position. For example, the phonemic pair/lo/was found in the pronunciation field/zoolo3ik/corresponding to the head-word zoologiquc &amp;quot;zoologic.&amp;quot; Out of 1225 phonemic pairs, 874 words were fonnd with at least one occurence of the pair.</Paragraph>
      <Paragraph position="7"> The pair \[headword_orth, headword_phon\] was extracted and headword_orth was placed in a carrier sentence for recording. For instance, the speaker would utter the following sentence: &amp;quot;C'est zoologique que je dis&amp;quot; where &amp;quot;C'est ... que je dis&amp;quot; is the carrier sentence. Due to the lack of explicit inflectional information for nmms and adjectives, only the non-inflected forms of the entries were extracted during dictionary lookup for building tile diphone table. Similarly for verbs, only the infinitive forms were used since the dictionary does not list the inflected forms as headwords. This exemplifies the most simple way to use pronunciation field data, which we have completed. A pronunciation list of around 85,796 phonetic words was obtained from the original list of ahnost 89,000 entries, i.e. 96% of the entries. The remaining 4% consist primarily of prefixes and suffixes which are listed in the dictionary without pronunciations,  and which should not be used in isolation in arty ease.</Paragraph>
    </Section>
    <Section position="2" start_page="972" end_page="972" type="sub_section">
      <SectionTitle>
2.2 Using the MRD for morphology
</SectionTitle>
      <Paragraph position="0"> Even though an MRI) may not list complete intlectional paradigms, it contains useful inflectional information. For example in the Collins Spanish-English dictionary, verb entries are listed with an index pointing to the conjugation chess and table, listed at the end of the dictionary. Using this infer mation, a finite-state transducer for morphological analysis and generation was built for Spanish \[20\].</Paragraph>
      <Paragraph position="1"> From the original list of over 50,000 words, a few million words have been generated. These forms can then be used as tile input to the grapheme-to-phoneme conversion module, in ;t Spanish TTS system.</Paragraph>
    </Section>
    <Section position="3" start_page="972" end_page="972" type="sub_section">
      <SectionTitle>
2.3 Using Run-on's
</SectionTitle>
      <Paragraph position="0"> A run-on is defined as a morphological variant of a headword, included in the entry. Run-on's are problematic data in MRI)s \[16\], and they can be found nearly anywhere in the entry. In example (4), the run-on occurs at the beginning of the entry, and consists of a fitll form with suffix. More commonly, run-on's occur towards the end of the entry, and tend to consist of predictable suttixation, that is, class II or neutral suttixes \[19\] , such  as :hess, dy, or -er, ~s in: (7) sharp adj .... sharp.ly adv sharp.hess n (8) suc.ces.sion n .... suc.ces.sion.al adj snc.ces.sion.al.ly adv In cases where stress is changed with class I non-neutral sultixes, a separate prououneiation is given as in: (9) gy.ro.scope /'ji-ra-,skSp/ n ....</Paragraph>
      <Paragraph position="1"> gy.ro.scop.ie /ji-ra-'sk~p-ik/ adjgy.ro.s('ol,.i.cal.ly/d-k(a-)le/ adv  The run-on form with part-of-speech is given in.side the entry, so it could be used for morphologi.eel analysis, tIowever, since proton|elation is usually predictable from the headword (i.e. there is usually no stress change, and if there is a change, this is explicitly indicated) the run-on pronunciation often consists of a truncated form, requiring some logic for reconstruction of the entire pronunciation. Again, this may be obvious to the human user, but rather complex to tigure out by l)rogram. 'l'hus, the run-on may be nsefld for Inorpl|ology, but is not ms useful h)r automatic pronunciation extraction.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML