File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-0607_metho.xml
Size: 33,741 bytes
Last Modified: 2025-10-06 14:08:01
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-0607"> <Title>Modeling English Past Tense Intuitions with Minimal Generalization</Title> <Section position="3" start_page="0" end_page="2" type="metho"> <SectionTitle> 2 Criteria for Evaluating Models </SectionTitle> <Paragraph position="0"> A number of properties are desirable in a learning model whose goal is to mimic human intuition. We have been motivated to develop our own model in part because these criteria have rarely been met by previous models. Such models include, for example, connectionist models (Rumelhart and McClelland 1986, Daugherty and Seidenberg 1994, MacWhinney and Leinbach 1991), neighborhood similarity models (Nakisa, Plunkett and Hahn 2001), decision tree/ILP models (Ling and Marinov 1993, Mooney and Califf 1996, Dzeroski and Erjavec 1997), and other rule-based models (Neuvel, to appear).</Paragraph> <Paragraph position="1"> Our first criterion is that a model should be able to generate complete output forms, rather than just grouping the outputs into (possibly arbitrary) categories such as &quot;regular,&quot; &quot;irregular,&quot; &quot;vowel change,&quot; etc. The reason is that people likewise generate fully specified forms, and a model's predictions can be fully tested only at this level of detail. null Second, a model should be able to make multiple guesses for each word and assign numerical well-formedness scores to each guess. People, too, often favor multiple outcomes, and they also have gradient preferences among the various possibilities (Prasada and Pinker 1993).</Paragraph> <Paragraph position="2"> Third, a model should be able to locate detailed generalizations. Here is an example: English past tenses are often formed by changing [] to [] when the final consonant of the word is [] (fling-flung, cling-clung, sting-stung). As experiments show, such generalizations are learned by speakers of English (that is, speakers do more than just memo- null The Analogical Model of Language (Skousen 1989, Eddington 2002) satisfies all of our criteria. However, in our use of this model so far, we have been unable to find any setting of its parameters that can achieve good correlations to our experimental data, reported below in section 4.</Paragraph> <Paragraph position="3"> On a practical level, an ability to consider multiple outputs would also improve the performance of a recognition system. For example, a system not told that spelt is a dialectal past tense for spell should be able to interpret it as such, even if spelled were its first choice.</Paragraph> <Paragraph position="4"> July 2002, pp. 58-69. Association for Computational Linguistics.</Paragraph> </Section> <Section position="4" start_page="2" end_page="2" type="metho"> <SectionTitle> ACL Special Interest Group in Computational Phonology (SIGPHON), Philadelphia, </SectionTitle> <Paragraph position="0"> Morphological and Phonological Learning: Proceedings of the 6th Workshop of the rize each irregular verb). For example, experimental participants often volunteer splung as the past tense of spling, extending the generalization to a novel verb.</Paragraph> <Paragraph position="1"> The importance of detailed generalizations is not limited to irregular forms. We have found that speakers are often sensitive to detailed generalizations even among regulars. For example, verbs in English ending in voiceless fricatives ([f, T, s, S]) are always regular. Our experiments indicate that English speakers are tacitly aware of this pattern.</Paragraph> <Paragraph position="2"> Thus, an accurate model of their linguistic intuitions must be able to detect and learn the pattern in the training data.</Paragraph> <Paragraph position="3"> Although detailed generalizations are important, it is also crucial for a learning model to be able to form very broad generalizations. The reason is that general morphological patterns cannot be learned simply as the aggregation of detailed patterns. Speakers can generate novel inflected forms even for words that don't fit any of the detailed patterns (Pinker and Prince 1988, Prasada and Pinker 1993). Thus, a general rule is needed to derive an output where no close analogues occur in the training set. A special case of this sort is where the base form ends in a segment that is not phonologically legal in the language (Halle 1978). Thus, the German name Bach can be pronounced by some English speakers with a final voiceless velar fricative [x]. Speakers who can pronounce this sound agree firmly that the past tense of to out-Bach must be [aUtbaxt] (Pinker 1999), following a generalization which is apparently learned on the basis of ordinary English words.</Paragraph> <Paragraph position="4"> In summary, we believe it is important that a learning model for morphology and phonology should produce complete output forms, generate multiple outputs, assign each output a well-formedness score, and discover both specific and broad generalizations.</Paragraph> </Section> <Section position="5" start_page="2" end_page="6" type="metho"> <SectionTitle> 3 Description of the Model </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="2" end_page="3" type="sub_section"> <SectionTitle> 3.1 Rule induction by minimal generalization </SectionTitle> <Paragraph position="0"> Our model employs a bottom-up approach to learning, iteratively comparing pairs of surface forms to yield ever more general rules. It takes as its input ordered pairs of forms which stand in a particular morphological relation - e.g., (present, past) - and compares the members of each pair to construct rules that derive one from the other. As an example, consider the pairs of forms in (1).</Paragraph> <Paragraph position="1"> When we compare the present and past forms of each word, we see that the relation between them can be expressed as a structural change (in this case, adding [-t], [-d], or [-d]) in a particular context (after [ms], after [hg], etc.). Formally, the structural change can be represented in the format A - B, and the context in the format / C__D, to yield word-specific rules like those in (2). (The symbol '#' stands for a word boundary.) (2) [?] - t/# ms __ # [?] - t / # prs __ # [?] - t / # laef __ # [?] - d / # hg __ # [?] - d/# rb __ # [?] - d / # nid __ # [?] - t/# dZmp __ # [?] - d / # plaen __ # The exact procedure for finding a word-specific rule is as follows: given an input pair (X, Y), the model first finds the maximal left-side substring shared by the two forms (e.g., #ms), to create the C term (left side context). The model then examines the remaining material and finds the maximal substring shared on the right side, to create the D term (right side context). The remaining material is the change; the non-shared string from the first form is the A term, and from the second form is the Note that either A or B can be zero. When A is zero and edge-adjacent, we are dealing with an affixational mapping. When B is zero and edgeadjacent, we are dealing with some sort of truncation; e.g. the mapping from English plurals to singulars. When neither A nor B is zero, we are dealing either with two paradigm members that each have their own affix, or cases of ablaut or similar nonconcatenative morphology.</Paragraph> <Paragraph position="2"> As such word-specific rules accumulate, the model attempts to generalize. As soon as two rules with the same structural change have been discovered, their contexts are compared to yield a more general rule, retaining all shared context material, and replacing all non-shared material with a variable. Here is the generalization process as applied to miss and press:</Paragraph> <Paragraph position="4"> The procedure for comparing contexts of two rules is much like the procedure for creating a word-specific rule. The general scheme is as shown in (5): . Working outwards from the structural change, it first locates the maximal right-side sub-string shared by C and C ; this shared substring forms part of the context for the new rule (C') -- in both contain additional unmatched material, then the segments immediately to the left of C' (here, [] and []) are compared to see what features they have in common. If they share any feature specifications, these are retained as a left-side featural term (C' feat ), in this case, [+syllabic, -low, -back, -tense, -round]. Fi-</Paragraph> <Paragraph position="6"> contains any additional material that has not been included in C' or C' feat , this is converted into a free variable (X). The same procedure is carried out in mirror image on the right, yielding shared D' and D' feat terms, and a right-side variable Y. Any of these terms may be null.</Paragraph> <Paragraph position="7"> This generalization procedure retains as much shared material as possible, yielding the most specific rule that will cover both input forms. For this reason, we call it minimal generalization. Minimal generalization is iterated over the data set. Iteration consists of comparing word-specific rules against other word-specific rules, and also against generalized rules.</Paragraph> <Paragraph position="8"> The procedure for comparing a word-specific rule with a generalized rule is much the same as in (5), but with the complication that it is often necessary to compare a segment in the word-specific rule with a featural term (C'</Paragraph> <Paragraph position="10"> ) in the generalized rule.</Paragraph> <Paragraph position="11"> The result of this procedure is a large list of rules, describing all of the phonological contexts in which each change applies. The fact that the model retains rules for each change means that it has the potential to generate multiple outputs for a novel input, satisfying one of the criteria we proposed in section 2.</Paragraph> <Paragraph position="12"> In some learning models, the goal of rule induction is to find the most general possible rule for each change. However, as noted above, we also require our model to assign gradient well-formedness scores to each output. To do this, we evaluate the reliability of rules, then evaluate outputs on the basis of the rules that derive them.</Paragraph> </Section> <Section position="2" start_page="3" end_page="4" type="sub_section"> <SectionTitle> 3.2 Calculating reliability and confidence </SectionTitle> <Paragraph position="0"> The reliability of rules is calculated as follows.</Paragraph> <Paragraph position="1"> First, we determine the number of forms in the training data that meet the structural description of the rule (for A - B / C__D, these are the forms that contain CAD). This number is the scope of the rule. The hits of the rule is the number of forms that it actually derives correctly. The reliability of a rule is simply the ratio of its hits to its scope.</Paragraph> <Paragraph position="2"> Intuitively, reliability is what makes a rule trustable. However, reliability based on high scope (for example, 990 correct predictions out of 1000) is better than reliability based on low scope (for example, 5 out of 5). Following Mikheev (1997), we therefore adjust reliability using lower confi- null We believe, but have not proven, that no additional rules are discovered by comparing generalized rules against generalized rules.</Paragraph> <Paragraph position="3"> dence limit statistics.</Paragraph> <Paragraph position="4"> The amount of the adjustment is a parameter (a), which ranges from .5 < a < 1; the higher the value of a, the more drastic the adjustment. The result of this adjustment value, which ranges from 0 to 1, we call confidence. Confidence values are calculated for each generalized rule, as soon as it is discovered. As each new input pair is processed, it is compared against previously discovered generalized rules to see whether it adds to their hits or scope. If so, their confidence values are updated.</Paragraph> <Paragraph position="5"> The list of rules, annotated for confidence, can be used to derive outputs for novel (unknown) inputs. In some systems, rules are applied in order of decreasing specificity; the particular rule that is used to derive an output is the most specific one available. In our system, rules are applied in order of decreasing confidence. The novel form is compared against each known change A</Paragraph> <Paragraph position="7"> to see if it contains the input to the change (A</Paragraph> <Paragraph position="9"> rules for that change are examined, in order of decreasing confidence, checking each rule to see if it is applicable. Once an applicable rule has been found, it is applied to create a novel output, and the</Paragraph> <Paragraph position="11"> ) is considered. Each output is assigned a well-formedness score, which is the confidence value of the rule that derives it; that is, the confidence value of the best available rule. These well-formedness scores allow the model to satisfy the second criterion laid out in section 2.</Paragraph> <Paragraph position="12"> Minimal generalization and confidence values provide an effective method of discovering the phonological context in which a particular morphological change applies. Rules that describe productive processes in the correct context will have a Following Mikheev, we use the following formula to calculate lower confidence limits: first, a particular reliability value (p^) is smoothed to avoid zeros in the numerator or denominator, yielding an adjusted value p^</Paragraph> <Paragraph position="14"> This adjusted reliability value is then used to estimate the true variance of the sample: estimate of true variance =</Paragraph> <Paragraph position="16"> Finally, this variance is used to calculate the lower confidence</Paragraph> <Paragraph position="18"> ), at the confidence level a:</Paragraph> <Paragraph position="20"> (The value z for confidence level a is found by look-up table.) very high confidence, whereas rules that describe exceptional processes or the wrong contexts will have lower confidence.</Paragraph> <Paragraph position="21"> Moreover, when a change applies with especially high reliability in some particular context, the rule that the model discovers for this context will have especially high confidence. Thus, for example, the rule that suffixes [-t] in the context of final voiceless fricatives (SS2), which is exceptionless and abundantly attested, is assigned an extremely high confidence value by our model.</Paragraph> </Section> <Section position="3" start_page="4" end_page="5" type="sub_section"> <SectionTitle> 3.3 Improving confidence with phonology </SectionTitle> <Paragraph position="0"> In many cases, it is possible to improve the confidence of morphological rules, and even expand their context, by discovering phonological rules.</Paragraph> <Paragraph position="1"> To continue with the example from (1) above, consider the rule that the model will generalize from the items [hg] and [rb]. In the feature system we use, the minimal natural class that covers both [g] and [b] is the set of voiced stops [b,d,g], so the model constructs a generalized rule that attaches [-d] after any member of this class.</Paragraph> <Paragraph position="2"> Suppose that the model is presented next with the input pair ([nid], [nidd]). It first attempts to update the confidence of the previously discovered generalized rules, including the rule adding [d] after voiced stops. Specifically, it tries to apply each rule to [nid], checking to see if the rule can derive the correct output [nidd]. When it does this, it discovers that the [-d] affixation rule fails, producing instead the incorrect output *[nidd].</Paragraph> <Paragraph position="3"> What we want the model to do in this situation is to recognize that [nidd] is in fact an instance of [-d] affixation, but that there is an additional phonological process of [] insertion that obscures this generalization.</Paragraph> <Paragraph position="4"> We allow the model to recognize this in the following way: first, we provide it ahead of time with a list of sequences that are illegal in English: *dd#, *td#, *fd#, *pd#, *bt#, and so on. (We believe that it is not unrealistic to do this, because experimental work (Jusczyk et al., 1993; Friederici and Wessels, 1993) suggests that children have a good notion of what sound sequences are legal in their language well before they begin to learn alternations.) When the learning model assesses the reliability of a rule and finds that it yields an incorrect output, it compares the incorrect output against the actual form, and hypothesizes a phonological rule of the form A - B / C __ D that would change the incorrect form into the correct one. In this case, applying the [-d] suffixation rule to [nid] yields incorrect *[nidd], which is compared against correct [nidd], and the phonological rule that is hypothesized is [?] - / d__d.</Paragraph> <Paragraph position="5"> Finally, the model examines the target of the phonological rule (CAD, in this case [dd]) to see if it contains a member of the list of known illegal sequences. If so, then the model has discovered a phonological rule that can help the morphological rule to produce the correct output, by fixing a phonologically illegal sequence.</Paragraph> <Paragraph position="6"> In the present case, the phonological rule allows [nid] to be counted as a hit for the morphological rule of [-d] suffixation, thus increasing the latter rule's reliability.</Paragraph> </Section> <Section position="4" start_page="5" end_page="5" type="sub_section"> <SectionTitle> 3.4 Overcoming complementary distribution </SectionTitle> <Paragraph position="0"> Unfortunately, not all phonological rules can be discovered by waiting for morphological rules to produce incorrect outputs. Consider how our model would analyze the pair ([ms], [mst]) 'miss(ed)'. Using the mechanisms described above, this would initially be treated as a case of [-t] suffixation. However, a more general analysis can be found if we realize that [-t] can be the result of /-d/ suffixation, with a phonological rule of devoicing that converts /-d/ to [-t] after a voiceless consonant. This could be achieved by having the model try attaching [-d] to [ms], yielding incorrect *[msd], from which the devoicing rule could be discovered using the procedure described in the previous section. However, under the assumption of strictly minimal generalization, the opportunity to try [-d] after [ms] would never arise. The reason is that [-d] suffixation was learned solely on the basis of voiced stems, so it would never apply to a voiceless stem like [ms]. More generally, the [-d] and [-t] allomorphs of the English past tense suffix occur in complementary distribution, so a system that uses minimal generalization would never construct rules that attempt to use one allomorph in the environment of the other.</Paragraph> <Paragraph position="1"> Our solution to this problem involves a slight relaxation of minimal generalization. The intuition is that when a new change is discovered (A - B, The set of possible phonological rules is restricted to inserting a segment, deleting a segment, altering a segment, converting one segment into two (diphthongization), converting two segments into one (simple coalescence), or converting two segments into two others (length-preserving coalescence (/XZ/ - [YY]) and metathesis).</Paragraph> <Paragraph position="2"> in this case [?] - d), we should check to see if there are any potentially related changes that have already been discovered (A - B', here [?] - t) that take the same input (A), but yield a different output. The idea is that B and B' might be the result of the same morphological rule, obscured by a phonological change.</Paragraph> <Paragraph position="3"> To do this, we take every context that appears in a rule with change A - B and pair it with the change A - B', creating a new set of rules, which we will call cross-context rules. For example, when the model encounters the first pair employing the [?] - d change, it takes all of the existing [?] - t rules and creates cross-context [?] - d variants of them. The result is, among other things, a rule affixing [-d] after voiceless fricatives, mirroring the previously generalized rule affixing [-t] in the same environment.</Paragraph> <Paragraph position="4"> The model then assesses the reliability of this cross-context rule, applying it to (among others) [ms] and deriving incorrect *[msd]. By comparing this with the actual output [mst], the model posits a phonological rule for devoicing, in the same manner as described in the previous section.</Paragraph> <Paragraph position="5"> It then checks to see if the proposed phonological rule will enable the cross-context rule to produce the same output as the rule from which it was cloned in all cases. If so, the cross-context rule is kept, and can serve as the input for further generalization. Thus, the phonological rule is able to extend the set of contexts in which [-d] affixation successfully applies.</Paragraph> <Paragraph position="6"> With these procedures in place, our model is able to discover a single rule that covers all English regular past tenses, namely [?] - d / ___ #.</Paragraph> <Paragraph position="7"> The various regular past tense allomorphs are derived from /-d/ by phonological rules of voicing assimilation (deriving [-t]) and [] insertion (deriving [-d]). We would guess that these are the rules that are assumed by most linguists; see Pinker and Prince (1988) for a detailed presentation.</Paragraph> <Paragraph position="8"> However, we discuss evidence below suggesting that simple [-d] affixation is not the only rule that derives regulars.</Paragraph> </Section> <Section position="5" start_page="5" end_page="5" type="sub_section"> <SectionTitle> 3.5 The grammar so far </SectionTitle> <Paragraph position="0"> We summarize here the grammar that is learned by our model (as described up until this point) when exposed to a representative corpus of English present-past pairs. The most general rule of the grammar is the noncontextual suffixation rule [?] - d / ___ #; with the help of phonology this rule can derive all regulars. In addition, the model also discovers a large number of rules with lower generality. Many of these rules describe subgeneralizations about the regular process, for example, the highly reliable rule suffixing [-t] (or its underlying counterpart /-d/) after voiceless fricatives. Other rules describe exceptional processes, such as - before [] (fling-flung, wringwrung, etc.), i - between a liquid and [d] (bleedbled, read-read, etc.), and no change after [t] (hithit, cut-cut, etc.). In general, such exceptional processes will have much lower confidence than the regular rules, partly because they are based on fewer forms, and partly because there are regular forms that fail to obey them (need-needed, not *ned).</Paragraph> <Paragraph position="1"> Lastly, the model learns a large number of rules that could fairly be described as detritus, because they are never used in deriving any form (other, more reliable rules take precedence over them). In principle, we could prune these rules from the finished grammar, though we have not taken this step in our current implementation.</Paragraph> </Section> <Section position="6" start_page="5" end_page="6" type="sub_section"> <SectionTitle> 3.6 The distributional encroachment problem </SectionTitle> <Paragraph position="0"> Exceptional forms are easy to identify as such when they involve a change that occurs in only a few words, such as - . Not all exceptions have this property, however; sometimes exceptions are disguised by the fact that they involve a change that is regular, but in a different environment.</Paragraph> <Paragraph position="1"> An example of this type of exception is seen in the past tense forms in (6), which occur in some</Paragraph> <Paragraph position="3"> These words form their past tense using one of the regular changes ([?] - t), but in the wrong environment (after sonorant consonants, rather than after voiceless ones). We call this type of exception distributional encroachment, because one morphological change is encroaching on the phonological context in which another change regularly occurs.</Paragraph> <Paragraph position="4"> Distributional encroachment appears to be a major problem for all morphological learning systems that attempt to find large-scale generalizations. In what follows, we will explain why the example in (6) is problematic, then propose a method for coping with distributional encroachment in general.</Paragraph> <Paragraph position="5"> Assume that prior to hearing any of the forms in (6), the model has already processed a fair number of regular stems ending in voiceless obstruents.</Paragraph> <Paragraph position="6"> Comparing forms like [ms]-[mst] 'miss(ed)', [laef]-[laeft] 'laugh(ed)', and [dZmp]-[dZmpt] 'jumped', the model would learn a number of rules of [t]-suffixation. Since [t] suffixation after voiceless obstruents is the regular outcome in English, these rules will achieve quite high confidence scores. Moreover, if we are willing to have a phonological rule that voices /-t/ to [-d] after a voiced obstruent, the context of /-t/ suffixation could be expanded to all obstruents. Under this analysis, past tense forms like hugged can now be derived as /hg/ - hgt - [hgd], so the confidence for this generalized rule would be even higher.</Paragraph> <Paragraph position="7"> The distributional encroachment problem is encountered when the model, having reached this state, is confronted with one of the exceptional forms in (6). The result will be a serious overgeneralization. Suppose that the first such form encountered is [bn]-[bnt] 'burn(t)'. The model would first posit a single-form rule adding [-t] after the stem [bn]. Then, the generalization procedure would compare it with the other known [-t] affixation rules, all of which apply after obstruents. This comparison would lead to a generalized rule adding [-t] after any consonant at all: [?] - t / [-syllabic]__#.</Paragraph> <Paragraph position="8"> Let us now estimate the reliability of this generalized rule. Corpus counts show that the final segments of verb stems occur in roughly the fol- null Suppose that prior to learning the form burnt, the model has learned 600 input pairs, of which 500 are regular and 100 are irregular exceptions, none The voiceless obstruents of English are [p, t, tS, k, f, T, s, S, h], and the voiced obstruents are [b, d, dZ, g, v, , z, Z]. of them of the burnt type. Assume for simplicity that the distribution of final segments in both regulars and irregulars follows the proportions of (7). Thus, there will be 300 regular obstruent-final stems, and 60 obstruent-final exceptions, giving the rule attaching [-t] after obstruents a reliability of 300/360 = .83. Since 500 of the verbs are regular and 100 are irregular, the confidence of the rule attaching [-d] after any segment will be 500/600, which is also .83.</Paragraph> <Paragraph position="9"> When the model encounters the pair ([bn], [bnt]), this adds a sonorant-final stem employing the [?] - t change. The first step the model takes is to update reliability scores. Rules attaching [-t] after obstruents will be unaffected, since [n] is not an obstruent. The reliability of the rule attaching [-d] everywhere drops a minuscule amount, from 500/600 to 500/601. The second step is the fatal one: generalization with [bnt] gives rise to the new rule [?] - t / [-syllabic]__#. This rule works correctly for 301 verbs (the 300 regular obstruent-final stems plus burnt), and fails for 210 verbs (the 60 obstruent-final exceptions, plus 150 verbs other than burnt that end in a sonorant consonant). Thus, its reliability would be 301/511, or .59. The prediction therefore is that for novel verbs that end in sonorant consonants, such as pran [praen], pasts with [-t] (prant [praent]) should be at least moderately acceptable as a second choice, after the regular pranned [praend]. We believe that this prediction is wrong; prant strikes us as absurd.</Paragraph> </Section> <Section position="7" start_page="6" end_page="6" type="sub_section"> <SectionTitle> 3.7 Impugnment as a solution to the distribu- </SectionTitle> <Paragraph position="0"> tional encroachment problem The problem we are faced with is to let the model identify cases of distributional encroachment as such, and not be fooled into grouping burnt and laughed together under the same [-t] generalization. Intuitively, the problem with the rule attaching [-t] after any consonant is that it is internally heterogeneous; it consists of one very consistent subset of cases (the obstruent-final stems) and one fundamentally different case (burnt). We can characterize internal heterogeneity more precisely if we compare the scope and hits of the &quot;correct&quot; rule (after obstruents) and the &quot;spurious&quot; rule (after any consonant): (8) Rule Hits Scope [?] - t / [-sonorant]__# 300 / 360 [?] - t / [-syllabic]__# 301 / 511 We see that the rule adding [-t] after any consonant gains just one hit, but adds a significant number of exceptions (150).</Paragraph> <Paragraph position="1"> Formalizing this intuition, we propose a refinement of the way that confidence is calculated, in order to diagnose when a subpart of a generalization is doing most of the work of the larger generalization. When we consider the confidence of a context C associated with a change A - B, we must consider every other context C ' associated with A - B, checking to see whether C ' covers a subset of the cases that C covers. In the present case, when we assess the confidence of adding [-t] after any consonant, we would check all of the other rules adding [-t], including the one that adds [-t] after obstruents. For each C ' that covers a sub-set of C , we must ask whether the rule A - B / C ' is actually &quot;doing most of the work&quot; of the larger rule A - B / C .</Paragraph> <Paragraph position="2"> To find out if the smaller rule is doing most of the work, we calculate how well the larger rule (C ) performs outside the area covered by the smaller rule (C '). The reliability of the residue area (C -</Paragraph> <Paragraph position="4"> From the reliability of this residue area (C C '), we can then calculate its confidence, using confidence limit statistics in a way similar to that described above in section 3.2. However, there is a crucial difference: when we are assessing whether a rule explains enough cases to be trustable, we are interested in the denseness of cases within the generalization. But when we are assessing whether a rule offers an improvement over a subpart, we are interested in the sparseness of cases in the residue outside of the subpart. Therefore, when calculating the confidence of the residue, we must use the upper confidence limit rather than the lower confidence limit.</Paragraph> <Paragraph position="5"> If the upper confidence limit of the reliability of the residue (C - C ') is lower than the lower confidence limit of the reliability of the larger context (C ), then we can infer that the smaller rule (A - B / C ') is doing most of the work of the larger rule (A - B / C ). Therefore, we penalize the larger rule by replacing its confidence value (Lower confidence(C )) with the confidence value of the residue (Upper confidence(C - C ')). We call this penalty impugnment, because the validity of the larger rule is being called into question by the smaller rule. Impugnment is carried out for all contexts of all rules.</Paragraph> <Paragraph position="6"> This impugnment algorithm is similar to the pruning algorithm proposed by Anthony and Frisch (1997). However, their algorithm requires that the smaller rule cover at least as many positive cases (hits) as the larger rule. In this case, the larger rule does cover one more case than the smaller rule (the form burnt), so it would not be eligible for pruning under their system. Impugnment is also similar to the pruning strategies based on &quot;minimum improvement&quot; or &quot;lift&quot; (e.g., Bayardo, Agrawal and Gunopulos 1999), but in this case, we are considering the improvement of a more general (less specified) context, rather than a more specific one, and the criterion of improvement is built in rather than user-specified.</Paragraph> </Section> <Section position="8" start_page="6" end_page="6" type="sub_section"> <SectionTitle> 3.8 The status of impugnment </SectionTitle> <Paragraph position="0"> We find that in general, impugnment suffices to relegate forms of the burnt class to the status of minor irregular classes, and thus saves the model from serious overgeneralization. Since distributional encroachment appears to be common in languages (Albright and Hayes 1999), we feel that impugnment or some other algorithm of equivalent effect is crucial for accurate morphological learning. null This said, we must add a somewhat puzzling postscript. In the experiment described below, we found that speakers gave forms like prant surprisingly high ratings. As a result, we found that we could achieve the closest match in modeling the experimental data by turning impugnment off. We feel that the high ratings for prant forms most likely were an artifact, reflecting the sociolinguistic status of burnt pasts (they are most often encountered by Americans as literary forms and may be felt to be prestigious). The upshot is that at present the empirical necessity of impugnment remains to be demonstrated.</Paragraph> </Section> </Section> class="xml-element"></Paper>