File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-0905_intro.xml
Size: 9,728 bytes
Last Modified: 2025-10-06 14:06:43
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-0905"> <Title>Front Back Consistency Overgeneration</Title> <Section position="4" start_page="35" end_page="37" type="intro"> <SectionTitle> 2 Formal Approach to the </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="35" end_page="36" type="sub_section"> <SectionTitle> Automatic Acquisition of Phonotactics 2.1 Syllable Classes </SectionTitle> <Paragraph position="0"> The phonological word is usually defined as a sequence of syllables, in fact not taking this general approach would mean ignoring a basic phonological regularity (the standard argumeats in favour of the syllable are summarised e.g. in Blevins (Blevins, 1994)). Phonological description has, as a rule, described syllables in terms of a single structure consisting of smaller units of description (usually onset, peak and coda) on which certain constraints hold, and words as sequences of one or more occurrences of this structure, on which by assumption no further constraints hold. In many languages, however, word-initial and/or word-final consonant clusters differ from other consonant clusters with regard to (co-)occurrence constraints. Goldsmith (1990, p. 107if) lists several examples from different languages. This has resulted in the use of the notion of extrasyllabicity to account for 'extra' consonantal segments at the beginnings and the ends of words. Similar problems occur with regard to tonal and metrical regularities, where the first and/or the last vowels in words are often referred to as 'extratonal' and/or 'extrametrical '1.</Paragraph> <Paragraph position="1"> There are two problems here. The first is that if a phonological theory assumes a single syllable class for a language and if the language has idiosyncratic word-initial and word-final phonotactics, then the set of possible words that the theory hypothesises is necessarily too large, and includes words that form systematic (rather than accidental) gaps in a language.</Paragraph> <Paragraph position="2"> The second problem is that if extrasyllabicity is used to reduce the first problem, then the resulting theory of syllable structure fails to account for everything that it is intended to account for, and is forced to integrate extrasyllabic material directly at the word level.</Paragraph> <Paragraph position="3"> Furthermore, it is likely that all languages display some phonological idiosyncracy at the beginnings and/or ends of phonological words.</Paragraph> <Paragraph position="4"> For these reasons, it seems more practical to make the general assumption that a word is of the form SISM*SF (where $1 stands for initial syllable, SM for medial syllable, and SF for final syllable 2. These basic syllable classes with different associated sets of phonotactic constraints enable the integration at the syllable level of seg1E.g. in the case of Kirundi, where words with an initial vowel have no tone assigned to the first vowel by word-level phonology, and in Central Siberian Yup'ik where final syllables are never stressed (Goldsmith, 1990, p. 29 and p.179 respectively).</Paragraph> <Paragraph position="5"> CELEX).</Paragraph> <Paragraph position="6"> ments traditionally accounted for by extrasyllabicity, and result in a more accurate hypothesis of a language's set of possible words. Mono-syllabic words -- often highly idiosyncratic 3 -may have to be accounted for separately, by a syllable class Smono.</Paragraph> <Paragraph position="7"> Consider as an example the syllable statistics from the German part of the lexical database CELEX (Baayen et al., 1995) shown in Figure 1. The statistics of set sizes and intersections suggest that 4 syllable classes are needed for German (initial, medial, final and monosyllables). Hypotheses for possible German words based on a single syllable class (5 :+) would arrive at much larger word sets than a hypothesis based on 4 syllable classes, and because it overgenerates, the theory would not reflect some of the phonotactic constraints that the statistics suggest hold for German.</Paragraph> <Paragraph position="8"> In addition to word-initial and word-final po-ZDafydd Gibbon, personal communication.</Paragraph> <Paragraph position="9"> sition, syllables may have idiosyncratic phonotactics as a result of tone and stress effects 4. It therefore seems natural to propose language-specific syllable class systems where each class has its own set of phonotactic constraints (intra-syllabic constraints) assigned to it. Words can then be defined as sequences of syllables, where language-specific 'syllabletactics' (inter-syllabic constraints) constrain the possible combinations of syllables from different classes, and hence the possible phonological forms of words s.</Paragraph> </Section> <Section position="2" start_page="36" end_page="37" type="sub_section"> <SectionTitle> 2.2 Syllabic Sections </SectionTitle> <Paragraph position="0"> For an automatic method of constructing phonotactic descriptions, the syllable as a unit of description is problematic in that the methods available for syllabification have recourse to morphological knowledge, an underlying, more abstract, underspecified level of description, and/or involve the notion of extrasyllabicity, and generally tend to require an amount of prior knowledge of language-specific phonotactics that is unacceptable where the aim is to discover these very constraints automatically.</Paragraph> <Paragraph position="1"> The main problems with syllabification arise from difficulties in assigning consonantal segments to exactly one syllable, or drawing unambiguous syllable boundaries between adjacent codas and onsets. Locating syllable peaks, or dividing lines between vocalic and consonantal segments (distinguishable in the acoustic signal) is less problematic, and the approach to word segmentation proposed here involves utilising the relative ease with which peak boundaries can be located 6. This requires the introduction of the term syllabic section to describe a grouping of phonological segments consisting of a peak and the consonantal material between it and either the preceding or the following peak.</Paragraph> <Paragraph position="2"> While the resulting sections are not syllables in the traditional sense, they are syllabic in that they form single stress and tone-bearing units.</Paragraph> <Paragraph position="3"> they are the subject of ongoing research.</Paragraph> <Paragraph position="4"> *Ambiguous material such as glides on peak boundaries poses no problem as long as it is consistently grouped either with the peak or with the surrounding consonantal material.</Paragraph> </Section> <Section position="3" start_page="37" end_page="37" type="sub_section"> <SectionTitle> 2.3 Learning Task </SectionTitle> <Paragraph position="0"> Phonological words are thus analysed in terms of intra-syllabic and inter-syllabic constraints as described in Section 2.1, while the traditional syllable is replaced by syllabic sections for reasons outlined in the last section.</Paragraph> <Paragraph position="1"> In some languages (such as German, e.g. in the analysis from (Jusek et al., 1994) shown in Figure 3) only peak and coda constrain each other, while in other languages (such as Russia.n) only onset and peak are mutually constrained (e.g. Halle, (1971)). The third possibility is that both types of constraints occur in the same language.</Paragraph> <Paragraph position="2"> In order to allow for all three possibilities, the following approach is taken: each word in a given training sample is scanned and segmented in two ways, once by division before the peak and once after. This two-way word segmentation results in the following two analyses:</Paragraph> </Section> <Section position="4" start_page="37" end_page="37" type="sub_section"> <SectionTitle> Initial Medial Final </SectionTitle> <Paragraph position="0"/> <Paragraph position="2"> In both cases, the initial and final sections together contain three subsections that can be interpreted as the onset, peak and coda of a traditional syllable, which makes it possible to use the same analysis to account for words of arbitrary length, including monosyllables if appropriate. This approach also has the advantage that it can incorporate constraints that cross the boundaries of traditional syllables, such as assimilation phenomena.</Paragraph> <Paragraph position="3"> For a given training sample of words, in the first scan, all initial syllabic sections resulting from the word segmentation described above are grouped together in data set D1, all final sections in data set D3, and all remaining sections (regardless of how many result from each word) in D2. The same process results in data sets D4-D6 for the second scan. The learning task is then to automatically construct an acyclic FSA on the basis of each data set, resulting in six automata A1-A6. Two cyclic automata C1 and C2 are then constructed (corresponding to the two scans) that have the following structure, where A1 and A4 correspond to Ai, A2 and A5 to AM and A3 and A6 to AF: A_M The final result is a hypothesis of the (wordlevel) phonological grammar of a given language, based on a given training sample, encoded by the intersection of C1 and C2 (i.e. a word has to be accepted by both in order to be considered well-formed). The present discussion is restricted to a basic syllable class system, but it is likely that descriptive accuracy can be further improved by extending this basic system to include tone and stress effects. This would of course result in more complex automata C1 and C2 (trivially inferrable here).</Paragraph> </Section> </Section> class="xml-element"></Paper>