File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/94/h94-1055_relat.xml
Size: 3,617 bytes
Last Modified: 2025-10-06 14:16:03
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1055"> <Title>Phonological Parsing for Bi-directional Letter-to-Sound/Sound-to-Letter Generation 1</Title> <Section position="3" start_page="0" end_page="289" type="relat"> <SectionTitle> PREVIOUS WORK Letter-to-Sound Generation </SectionTitle> <Paragraph position="0"> One of the first approaches adopted for letter-to-sound generation is typified by MITalk \[8\]. It follows the theories of generative grammar and the transformational cycle as proposed by Chomsky and Halle \[3\]. A large set of ordered cyclical rules are applied in turn to the word in question until a final pronunciation emerges. While the process of establishing the appropriate rule set was tedious and time-consuming, the resulting system achieved a degree of accuracy that, to our knowledge, has not yet been matched by other more automatic techniques.</Paragraph> <Paragraph position="1"> Because the generation of cyclical rules is a difficult and complicated task, several research groups have attempted to acquire letter-to-sound generation systems through automatic or semi-automatic data-driven techniques, based on neural nets or on an information theoretic approach. Typically, the goal is to provide as little a priori information as possible, ideally, only a set of pairings of letter sequences with corresponding (aligned or unaligned) phone sequences. Iterative training algorithms then produce a probability model that is applied to predict the most likely pronunciation. Probably the best known of these systems is NETtalk \[4\], which learns a pronunciation of the current letter by considering the six surrounding letters as input to the neural network.</Paragraph> <Paragraph position="2"> Lucassen and Mercer \[5\] acquired a set of rules automatically from a large lexicon of phonetically labelled data by growing decision trees using a criterion based on mutual information. Although direct comparisons of performance of different systems is difficult due to the lack of standardized phone sets, data sets, or scoring algo- null rithms, these systems have reported phone accuracies in the low 90's in terms of the percent of letters correctly pronounced.</Paragraph> <Paragraph position="3"> Sound-to-Letter Generation To our knowledge, there has been very little previous work reported in the literature addressing the problem of sound-to-letter generation. We are aware of only two prior research efforts in this area.</Paragraph> <Paragraph position="4"> Lucas and Damper \[6\] developed a system for bi-directional text-phonetics translation using two neural networks to perform statistical string translation. This system does not require pre-aligned text-phonetic pairs for training, but instead tries to infer appropriate segmentations and alignments. In a phonetics-to-text translation task using two disjoint 2,000-word corpora for training mad testing, they reported a 71.3% letter and a 22.8% word accuracy.</Paragraph> <Paragraph position="5"> Another related effort was conducted by Alleva and Lee \[7\], who used HMMs to model the acoustics of training sentences based on the orthographic transcriptions. Context-dependent quad-letter acoustic models were trained with 15,000 sentences, and used in conjunction with a 5-gram letter language model. Testing on a disjoint corpus of 30 embedded and end-point detected words (place and ship names) gave a 39.3% letter error rate and 21.1% word accuracy. However, this result is not directly comparable to our work because the phonemic/phonetic representation is bypassed.</Paragraph> </Section> class="xml-element"></Paper>