File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-3202_intro.xml

Size: 7,764 bytes

Last Modified: 2025-10-06 14:04:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3202">
  <Title>Improving Syllabification Models with Phonotactic Knowledge</Title>
  <Section position="3" start_page="11" end_page="14" type="intro">
    <SectionTitle>
2 Method
</SectionTitle>
    <Paragraph position="0"> We build on the approach of M&amp;quot;uller (2001) which combines the advantages of treebank and bracketed corpora training. Her method consists of four steps: (i) writing a (symbolic i.e. non-probabilistic) context-free phonological grammar with syllable boundaries, (ii)trainingthisgrammaronapronunciation dictionary which contains markers for syllable boundaries (see Example 1; the pre-terminals &amp;quot;X[&amp;quot; and &amp;quot;X]&amp;quot; denote the beginning and end of a syllable such that syllables like [strIN] can be unambiguously processed during training), (iii) transforming the resulting probabilistic phonological grammar by dropping the syllable boundary markers1 (see Example 2), and (iv) predicting syllable boundaries of unseen phoneme strings by choosing their most probable phonological tree according to the transformed probabilistic grammar. The syllable boundaries can be extracted from the Syl node which gov- null erns a whole syllable.</Paragraph>
    <Paragraph position="1"> (1) Word - X[ Sylone ]X (2) Word - Sylone  We use a grammar development procedure to describe the phonological structure of words. We expect that a more fine-grained grammar increases the precision of the prediction of syllable boundaries as more phonotactic information can be learned. In the following section, we describe the development of a series of grammars.</Paragraph>
    <Section position="1" start_page="11" end_page="14" type="sub_section">
      <SectionTitle>
2.1 Grammar development
</SectionTitle>
      <Paragraph position="0"> Our point of comparison is (i) the syllable complexity grammar which was introduced by M&amp;quot;uller (2002). We develop four different grammars: (ii) the phonotactic grammar, (iii) the phonotactic on-nuc grammar (iv) the phonotactic nuc-coda grammar and (v) the phonotactic on-nuc-coda grammar. All five grammars share the following features: The  to n syllables which in turn branch into onset and rhyme. The rhyme is re-written by the nucleus and the coda. Onset or coda could be empty. Furthermore, all grammar versions differentiate between monosyllabic and polysyllabic words. In polysyllabic words, the syllables are divided into syllables appearing word-initially, word-medially, and wordfinally. Additionally, the grammars distinguish between consonant clusters of different sizes (ranging from one to five consonants).</Paragraph>
      <Paragraph position="1"> We assume that phonotactic knowledge within the onset and coda can help to solve a syllabification task. Hence, we change the rules of the syllable complexity grammar (M&amp;quot;uller, 2002) such that phonotactic dependencies are modeled. We express the dependencies within the onset and coda as well as the dependency from the nucleus by bi-grams.</Paragraph>
      <Paragraph position="2">  The grammars are generated automatically (using perl-scripts). Asallpossiblephonemesinalanguage are known, our grammar generates all possible re-write rules. This generation process naturally overgenerates, which means that we receive rules which will never occur in a language. There are, for instance, rules which describe the impossible English onset /tRS/. However, our training procedure and our training data make sure that only those rules will be chosen which occur in a certain language.</Paragraph>
      <Paragraph position="3"> The monosyllabic English word string is used as a running example to demonstrate the differences of the grammar versions. The word string is transcribed in the pronunciation dictionary CELEX as ([strIN]) (Baayen et al., 1993). The opening square bracket, &amp;quot;[&amp;quot;, indicates the beginning of the syllable and the closing bracket, &amp;quot;]&amp;quot;, the end of the syllable. The word consists of the tri-consonantal onset [str] followed by the nucleus, the short vowel [I] and the coda [N].</Paragraph>
      <Paragraph position="4"> In the following paragraphs, we will introduce the different grammar versions. For comparison reasons, we briefly describe the grammar of M&amp;quot;uller (2002) first.</Paragraph>
      <Paragraph position="5">  2002) The syllable complexity grammar distinguishes between onsets and codas which contain a different number of consonants. There are different rules which describe zero to n-consonantal onsets. Tree (3) shows the complete analysis of the word string.</Paragraph>
      <Paragraph position="6">  [str]. This rule occurs in example tree 3 and will be used for words such as string or spray. Rule (5) describes a two-consonantal onset occurring in the analysis of words such as snake or stand. However, this grammar cannot model phonotactic dependencies from the previous consonant.</Paragraph>
      <Paragraph position="7">  Thus, we develop a phonotactic grammar which differs from the previous one. Now, a consonant in the onset or coda depends on the preceding one. The rules express bi-grams of the onset and coda consonants. The main difference to the previous grammars can be seen in the re-writing rules involving phonemic preterminal nodes (rule 6) as well as terminal nodes for consonants (rule 7).</Paragraph>
      <Paragraph position="8">  (6) X.r.C.s.t - C X.r.C+.s.t (7) X.r.C.s.t - C Rules of this type bear four features for a consonant C inside an onset or a coda (X=On, Cod),  namely: the position of the syllable in the word (r=ini, med, fin, one), the current terminal node (C = consonant), the succeeding consonant (C+), the cluster size (t = 1...5), and the position of a consonant within a cluster (s = 1...5).</Paragraph>
      <Paragraph position="9"> The example tree (8) shows the analysis of the word string with the current grammar version. The  rule (9) comes from the example tree showing that the onset consonant [t] depends on the previous consonant [s].</Paragraph>
      <Paragraph position="10">  We also examine if there are dependencies of the firstonsetconsonantonthesucceedingnucleus. The dependency of the whole onset on the nucleus is indirectly encoded by the bi-grams within the onset. The phonotactic onset-nucleus grammar distinguishes between same onsets with different nuclei. In example tree (12), the triconsonantal onset starting with a phoneme [s] depends on the Nucleus [I]. Rule (10) occurs in tree (12) and will be also used for words such as strict or strip whereas rule (11) is used for words such as strong or strop.</Paragraph>
      <Paragraph position="11">  The phonotactic nucleus-coda grammar encodes the dependency of the first coda consonant on the nucleus. The grammar distinguishes between codas that occur with various nuclei. Rule 13 is used, for instance, to analyze the word string, shown in Example tree 15. The same rule will be applied for words such as bring, king, ring or thing. If there is a different nucleus, we get a different set of rules. Rule 14, e.g., is required to analyze words such as long, song, strong or gong.</Paragraph>
      <Paragraph position="12">  (13) Codaone.I.1 - N Coone.t.1.1 (14) Codaone.O.1 - N Coone.t.1.1 (15) Word  The last tested grammar is the phonotactic onsetnucleus-coda grammar. It is a combination of grammar 2.1.4 and 2.1.5. In this grammar, the first consonant of the onset and coda depend on the nucleus. Tree 16 shows the full analysis of our running example word string.</Paragraph>
      <Paragraph position="13">  The rules of the subtree (17) are the same for words such as string or spring. However, words with a different nucleus such as strong will be analyzed with a different set of rules.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML