File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-1107_metho.xml
Size: 15,578 bytes
Last Modified: 2025-10-06 14:14:51
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-1107"> <Title>Stochastic phonological grammars and acceptability</Title> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 7) S--> OR </SectionTitle> <Paragraph position="0"> Some more recent theories of the syllable do not have onsets and rhymes as such, but distinguish the region of the syllable up to the head vowel from the region consisting of the head vowel and any following tautosyllabic consonants. The internal decomposition of the onset and the rhyme are highly controversial, with some theories positing highly articulated tree structures and others no structure at all. We sidestep this issue by taking onsets and rhymes to be unanalyzed strings.</Paragraph> <Paragraph position="1"> We adopted this approach because a prosodic grammar with two node levels is already sufficiently complex for our purposes, which is to compare the effects of local and diffuse phonotactic deviance.</Paragraph> <Paragraph position="2"> One might think that rules 1) - 7), augmented by a large set of rules for spelling out the terminals, would provide a sufficient grammar to describe English monosyllabic and disyllabic words. But they do not. Difficulties arise because the inventories of onsets and rhymes are not the same at all positions in the word. Attempts to accommodate this fact provide a mainstay of the literature on syllabification. The main qualitative observations are the following: 1) Extra consonants are found at the end of the word which are non-existent or rare at the end of word internal syllables. The coronal affixes (/s/, /t/, and /0/) provide the best known example of extra consonants. However, the pattern is much more pervasive, with many cases involving neither independent morphemes nor coronal consonants.</Paragraph> <Paragraph position="3"> Rhymes such as/elnp/(as in &quot;hemp&quot;) and/~elk/as in &quot;talc&quot; are also more prevalent at the end of the word than in the middle. 2) Light syllables with a lax full vowel are permitted only nonfinally. 3) Word-initial syllables need not have an onset, whereas word-medial syllables usually have an onset (of at least one consonant): hiatus is uncommon.</Paragraph> <Paragraph position="4"> Extraneous consonants at the words edges can be generated by supplementing a grammar of type</Paragraph> <Paragraph position="6"> As noted in McCarthy and Prince (1993), such a treatment fails to capture the fact that word edges provide a location for defective syllables in addition to overlarge ones. When we turn to probabilistic models, the limitations of the approach in 8) become even more apparent. The probability distributions for all onsets and rhymes depend on the position in the word. For example, /t/ is possible in coda position both finally (as in &quot;pat&quot;) and medially (as in &quot;jit.ney&quot;). A classical grammar would stop at that. But a probabilistic grammar must undertake to the model the fact that /t/ is much more common as a word-final coda than as a word-medial one, and that acceptability judgments by native speakers reflect this fact (Pierrehumbert, 1994). Therefore, we handle deviance at the word edges in a different manner.</Paragraph> <Paragraph position="7"> Stochastic grammars provide us with the possibility of describing such effects by expanding (or rather, failing to collapse) the rules for subordinate nodes in the tree. Instead of attempting to assign a probability to rule 7), which applies regardless of the position of the syllable in the tree, we label the syllable nodes according to their position in the word, and propagate this labelling through all lower expansions. The total inventory of syllable types is then: strong initial syllables which are not also final strong final syllables which are not also initial strong syllables which are both initial and final and similarly for weak syllables, Swi, Swf and Swif. For a lexicon which included longer words, it would of course also be necessary to provide for medial syllables.</Paragraph> <Paragraph position="8"> Propagating this type of indexing, we can then provide for the fact that the rhyme/emp/is more common word finally than elsewhere as follows: This is, obviously, a brute force solution to the problem. It has the penalty that it treats as unrelated cases which are, in fact, related. In order to allow monosyllabic words to display both word-initial anomalies for the onset, and word-final anomalies for the rhyme, it is necessary to posit the categories Ssif and Swif. But then the expansion of the Ssif rhyme becomes formally unrelated to that of the Ssf rhyme, and that of the Ssif onset is unrelated to that of the Ssi onset. The practical penalty is that proliferation of logically different types under this approach reduces the count of words which can be used in training the probabilities for any individual case. For the rarer cases, the result can be that the sample sizes are reduced to a point at which statistically reliable estimates of the probabilities are no longer available from a full-size dictionary.</Paragraph> <Paragraph position="9"> This is a scientific problem in ~ addition to an engineering problem. In developing robust and productive phonotactics, speakers must have a better ability than standard stochastic CFGs provide to treat different contexts as analogous so that data over these contexts can be collapsed together. In developing the present parser, we have made a further assumption which allows us to circumvent this problem. In general, the phonological effects of edges are concentrated right at the edge in question. This means that the effect of the left word edge is concentrated on the onset, while the effect of the right word edge is concentrated on the rhyme. The tabulation of probabilities can then be organized according to the vertical, root-to-frontier paths through the tree with only a highly restricted reference to the horizontal context. Specifically, we claim that the root-to-frontier paths are tagged only for whether the frontier is at the left and/or the right edge of the word. Some example paths, those of the word</Paragraph> <Paragraph position="11"> which we write for convenience U : W : Ssi : Osi : k, U : W : Ssi : Rsi : a~n, etc.</Paragraph> <Paragraph position="12"> Although the resulting representations are remiscent of those used in data-oriented parsing (see Bod, 1995), there is a very important difference. The paths we use partition the data; each terminal string is an instance of only one path = type, with the. result that the probabilities add up to one over all paths. The result is that paths are properly treated as statistically independent, modulo any empirical dependencies which we have failed to model. DOP posits multiple descriptions Which can subsume each other, so that any given Syntactic fragment can contribute to many different descriptions. As a result, the descriptions are not independent by the very nature of the Way they are set up. = To use the paths in parsing new examples, we zip consistent paths together from their roots downwards, unifying neighbouring categories as far down the paths as possible, an operation we call sequential ':path unification. The probability of the combined l~ath is taken to be the product of the probabilities of the two parts. That is, since the original path sit partitioned the data, a finite state model is a justifiable method of combining paths.</Paragraph> <Paragraph position="13"> Onsets and rhymes which are unattested in the original dictioiaary are assigned a nominal low probability by Good-Turing estimation (Good, 1953) which Bod (1995) argues to be better behaved than alternative methods for dealing with missing probab!lity estimates for infrequent items.</Paragraph> <Paragraph position="14"> The sequencing constraints described by the original gramn~ar (for example, the requirement that an onset be followed by a rhyme and not by another onset) are enforced by tagging some nodes for the type of e~lement which must succeed it, in a fashion reminiscent of categorial grammar. That is, onsets must ~ be followed by rhymes with the same i/f and s/'w subscripts, and initial syllables = must be followed by final syllables, with an initial weak syllable followed by a strong syllable or an initial strong syllable followed by a weak one.</Paragraph> <Paragraph position="16"> In 15b), the parse fails as the initial Osi is not followed by an Rsi, as it requires.</Paragraph> <Paragraph position="17"> 3 How the training was carried out To establish the path probabilities for English monosyllabic and disyllabic words, the paths were tabulated over the 48,580 parsed instances of such words in Mitton (1992). With each word containing two to four paths, there was a total of 98,697 paths in the training set.</Paragraph> <Paragraph position="18"> Parsing such a large set of words requires one Osf to take a stand on some issues which are disputed s 234 in the literature. Here are the most important of t 206 these decisions. 1) We included every single form 1 193 in the dictionary, including proper nouns, no r 164 matter how foreign or anomalous it might appear p 157 to be, because we have the working hypothesis m 152 that low probabilities can explain the poor v 152 productivity of anomalous patterns. 2) Following f 139 current phonological theory (see e.g. Ito 1988), we d 123 syllabified all word-medial VCV sequences as V.CV. As a related point, we took medial clusters k 123 beginning with/s/to be syllable onsets when they were possible as word onsets. If the sC sequence Rsf is not an attested word onset, it was split medially em 45 (e.g. 'bus.boy&quot;). elt 41 There are a number of situations in which the eIts 37 dictionary does not mark phonological information which we know to be important. We et 37 have done our best to work around this fact, but in es 34 some cases our estimates are inevitably iz34 contaminated. Specifically: although compounds ekt 33 which are hyphenated in the dictionary can be ekts 33 (correctly) parsed as two phonological words, many compounds have no indication of their ent 33 status and are parsed as if they were single words, eI 32 Similarly, words # affixes such as -ly and -ness have been parsed as if they had no internal structure. This contaminates the counts for nonfinal rhymes with a certain number of final rhymes, and it contaminates the counts for noninitial onsets with a certain number of word-initial onsets. Second, stress is not marked in monosyllabic words. We have therefore taken all monosyllabic words to have a main word stress.</Paragraph> <Paragraph position="19"> As a result, a few reduced pronunciations for function words are included, with the result that there is a small, rather than a zero, probability for stressed syllable rhymes with a schwa. Third, secondary stresses are not reliably marked, particularly when adjacent to a primary stress (as in the word &quot;Rangoon&quot;). This means that a certain number of stressed rhymes have been tabulated as if they were unstressed. These problems for the most part can be viewed as sources of noise. We believe that the main trends of our tabulations are correct. To illustrate the fact that positional probabilities differ, table 1 compares the 10 most frequent onsets and rimes in each position.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Neologisms </SectionTitle> <Paragraph position="0"> The data set we used to evaluate the parser was obtained in a prior study (Coleman 1996). The goal of this study was to &quot;evaluate the psychological reality of phonotactic constraints.</Paragraph> <Paragraph position="1"> The materials were designed to permit minimal comparisons between a nonsense word which was in principle possible and one which was expected to be impossible by virtue of containing an onset or a rhyme which does not occur at all in the Mitton (1992) dictionary. Thus, the materials were made up of paired words such as /'mhsl~s/ (impossible by virtue of the cluster /rnl/) and /'9hslzs/ (otherwise identical, but containing the attested cluster 191/instead of/ml/).</Paragraph> <Paragraph position="2"> The materials were randomized, with a post-hoc test to ensure that related items in a pair were separated in the presentation. The words were recorded by John Coleman and presented aurally, twice over, to 6 naive subjects, who judged whether each word could or could not be a possible English word by pressing one of two response buttons. The total number of responses against the well-formedness of each word was taken as a score of subjective degree of wellformedness. : The distributions of scores of forms containing non-occuring clusters and those containing occuring clusters were significantly distinct.</Paragraph> <Paragraph position="3"> Forms which~ were designed to be &quot;bad&quot; were judged &quot; ..i sigmfiCantly worse than forms which were designed to be &quot;good&quot;. This was the case for the pooled data, ahd for each matched pair, the &quot;bad&quot; variant received a lower score than &quot;good&quot; variant for 61/75 &quot;~ pmrs. However the data contained a number of surprises, some of which, indeed, motivated thel present study. The scores of the &quot;bad&quot; forms ,were much more variable than anticipated. &quot;Bad&quot; forms in some pairs (e.g. /nuu'pe~J'n/) were scored better than &quot;good&quot; forms in other pairs (e.g. /'splet,soM). Apparently, a single subpart ,of zero (observed) probability is not enough to render a form impossible. Conversely, forms which v~iolate no constraints, but which are composed of 10w frequency constituents and have few lexical r neighbors, are assigned low acceptability s~cores e.g. /'firjkslAp/ and /\]'o'lencS/, which scored 1,2, i.e. completely unacceptable.</Paragraph> <Paragraph position="4"> These findings are contrary to the predictions both of a ~classical phonological treatment (according to which linguistic competence is categorical, and forms which cannot be parsed are impossible) a~ well as to the predictions of Optimality T!eory (in which a single severe deviation should determine the evaluation of the form). Appare~)ly, the well-formed subparts of an otherwise ill-f0rmed word may alleviate the ill-formed parts, especially if their frequency is high, as in the &quot;ation&quot; part of &quot;mrupation&quot; (/nuu'pelJ'n/). We used the stochastic grammar to parse the 116 mono- and di-syllabic neologisms from the earlier study, and compared various methods of scoring the goodness of the parse as a predictor of the experimentally obtained measure of acceptability. Specifically, we compared the four alternatives discussed in the introduction. Of the four proposals' for scoring phonotactic wellformedness, tfiree yield statistically significant correlations :with experimentally obtained judgzements. (Significance was assessed via a t-test on r, two-tailed, df= 114.) Scoring method 2) is a better model of acceptability than 1) because it.- linearizes the exponential shape of p(word) arising from the multiplication of successive parts. Figure 1 is a scatterplot of the best correlation, ln(p(word)) against the number of votes against wellformedness. It is apparent that less probable words are less acceptable.</Paragraph> </Section> class="xml-element"></Paper>