File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/a00-2040_intro.xml
Size: 3,053 bytes
Last Modified: 2025-10-06 14:00:47
<?xml version="1.0" standalone="yes"?> <Paper uid="A00-2040"> <Title>A Finite State and Data-Oriented Method for Grapheme to Phoneme Conversion</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Automatic grapheme to phoneme conversion (i.e.</Paragraph> <Paragraph position="1"> the conversion of a string of characters into a string of phonemes) is essential for applications of text to speech synthesis dealing with unrestricted text, where the input may contain words which do not occur in the system dictionary. Furthermore, a transducer for grapheme to phoneme conversion can be used to generate candidate replacements in a (pronunciation-sensitive) spelling correction system. When given the pronunciation of a misspelled word, the inverse of the grapheme to phoneme transducer will generate all identically pronounced words. Below, we present a method for developing such grapheme to phoneme transducers based on a combination of hand-crafted conversion rules, implemented using finite state calculus, and automatically induced rules.</Paragraph> <Paragraph position="2"> The hand-crafted system is defined as a two-step procedure: segmentation of the input into a sequence of graphemes (i.e. sequences of one or more characters typically corresponding to a single phoneme) and conversion of graphemes into (sequences of) phonemes. The composition of the transducer which performs segmentation and the transducer defined by the conversion rules, is a transducer which converts sequences of characters into sequences of phonemes.</Paragraph> <Paragraph position="3"> Specifying the conversion rules is a difficult task.</Paragraph> <Paragraph position="4"> Although segmentation of the input can in principle be dispensed with, we found that writing conversion rules for segmented input substantially reduces the context-sensitivity and order-dependence of such rules. We manually developed a grapheme to phoneme transducer for Dutch data obtained from CELEX (Baayen et al., 1993) and achieved a word accuracy of 60.6% and a phoneme accuracy of 93.6%.</Paragraph> <Paragraph position="5"> To improve the performance of our system, we used transformation-based learning (TBL) (Brill, 1995). Training data are obtained by aligning the output of the hand-crafted finite state transducer with the correct phoneme strings. These data can then be used as input for TBL, provided that suitable rule templates are available. We performed several experiments, in which the amount of' training data, the algorithm (Brill's original formulation and 'lazy' variants (Samuel et al., 1998)), and the number of rule templates varied. The best experiment (40K words, using a 'lazy' strategy with a large set of rule templates) induces over 2000 transformation rules, leading to 92.6% word accuracy and 99.0% phoneme accuracy. This result, obtained using a relatively small set of training data, compares well with that of other systems.</Paragraph> </Section> class="xml-element"></Paper>