File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-0608_intro.xml
Size: 2,861 bytes
Last Modified: 2025-10-06 14:01:30
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-0608"> <Title>Probabilistic Context-Free Grammars for Phonology</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In this paper, we present an approach to supervised learning and automatic detection of syllable structure. The primary goal of the paper is to show that probabilistic context-free grammars can be used to gain substantial phonological knowledge about syllable structure. Beyond an evaluation of the trained model on a real-world task documenting the performance of the model, we focus on an extensive qualitative evaluation.</Paragraph> <Paragraph position="1"> In contrast to other approaches which work with syllable structures extracted from a pronunciation dictionary, our approach focuses on the probability of use of certain syllable structures. Among other approaches that deal with syllable structure, there are example-based approaches (Hall (1992), Wiese (1996), F'ery (1995), Kenstowicz (1994), Morelli (1999)), symbolic approaches (Belz, 2000), connectionist phonotactic models (Stoianov and Nerbonne, 1998), stochastic models describing partial structures (Pierrehumbert (1994), Coleman and Pierrehumbert (1997)), or application-based approaches for syllabification (Van den Bosch, 1997) or text-to-speech systems (Kiraz and M&quot;obius, 1998).</Paragraph> <Paragraph position="2"> Our method builds on two resources. The first one is a large written text corpus, which is looked-up in a pronunciation dictionary resulting in a large transcribed and syllabified corpus. The second resource is a manually written context-free grammar describing German and English syllable structure. We code the assumptions (similar to Goldsmith (1995)) that the phonological material that can occur in the onsets or codas might differ depending on the syllable positions: word-initial, word-final, word-medial, versus monosyllabic words.</Paragraph> <Paragraph position="3"> We train the context-free grammar for German on the transcribed and syllabified training corpus with a simple supervised training method (M&quot;uller, 2001a). The main idea of the training method is that after a grammar transformation step, the grammar together with a parser can predict syllable boundaries of unknown phoneme strings. The trained model is evaluated on a syllabification task showing a high precision on a test corpus. We exemplify that the method can be easily transferred to related languages (here English) by adding rules for missing phonemes to the grammar. In an qualitative evaluation, we compare German and English syllable structure by interpreting the probability weights of the preterminal July 2002, pp. 70-80. Association for Computational Linguistics.</Paragraph> </Section> class="xml-element"></Paper>