File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/p95-1001_intro.xml
Size: 2,186 bytes
Last Modified: 2025-10-06 14:05:54
<?xml version="1.0" standalone="yes"?> <Paper uid="P95-1001"> <Title>Learning Phonological Rule Probabilities from Speech Corpora with Exploratory Computational Phonology</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 The Algorithm </SectionTitle> <Paragraph position="0"> In this section we describe our algorithm which assigns probabilities to hand-written, optional phonological rules like flapping. The algorithm takes a lexicon of underlying forms and applies phonological rules to produce a new lexicon of surface forms. Then we use a speech recognition system on a large corpus of recorded speech to check how many times each of these surface forms occurred in the corpus.</Paragraph> <Paragraph position="1"> Finally, by knowing which rules were used to generate each surface form, we can compute a count for each rule. By combining this with a count of the times a rule did not apply, the algorithm can compute a probability for each rule.</Paragraph> <Paragraph position="2"> The rest of this section will discuss each of the aspects of the algorithm in detail.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 The Base Lexicon </SectionTitle> <Paragraph position="0"> Our base lexicon is quite large; it is used to generate the lexicons for all of our speech recognition work at ICSI. It contains 160,000 entries (words) with 300,000 pronunciations. The lexicon contains underlying forms which are very shallow; thus they are post-lexical in the sense that there is no represented relationship between e.g. 'critic' and 'criticism' (where critic is pronounced kritik and criticism kritisizrn). However, the entries do not represent flaps, vowel reductions, and other coarticulatory effects. null In order to collect our 300,000 pronunciations, we combined seven different on-line pronunciation dictionaries, including the five shown in Table 12 .</Paragraph> <Paragraph position="1"> expanded lexicon.</Paragraph> <Paragraph position="2"> For further information about these sources please refer to CMU (CMU 1993), LIMSI (Lamel 1993),</Paragraph> </Section> </Section> class="xml-element"></Paper>