File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/84/p84-1070_metho.xml
Size: 19,058 bytes
Last Modified: 2025-10-06 14:11:42
<?xml version="1.0" standalone="yes"?> <Paper uid="P84-1070"> <Title>A DISCOVERY PROCEDURE FOR CERTAIN PHONOLOGICAL RULES</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2. DISCOVERY PROCEDURES </SectionTitle> <Paragraph position="0"> This approach deals with acquisition without reference to a specific discovery procedure, and so in some sense the results of such research are general~ in that in principle they apply to all discovery procedures. Still, I think that there is some utility in considering the problem of acquisition in terms of actual discovery procedures.</Paragraph> <Paragraph position="1"> Firstly, we can identify the parts of a grammar that are underspeeified with respect to the available data. Parts of a grammar or a rule are strongly data determined if they are fixed or uniquely determined by the data, given the requirement that overall grammar be empirically correct.</Paragraph> <Paragraph position="2"> By contrast, a part of a grammar or of a rule is weakly data determined if there is a large class of grammar or rule parts that are all consistent with the available data. For example, if there are two possible analyses that equally well account for the available data, then the choice of which of these analyses should be incorporated in the final grammar is weakly data determined. Strong or weak data determination is therefore a property of the grammar formalism and the data combined, and independent of the choice of discovery procedure.</Paragraph> <Paragraph position="3"> Secondly, a discovery procedure may partition a phonological system in an interesting way. For instance, in the discovery procedure described here tile evaluation metric is not called apon to compare one grammar with another, but rather to make smaller, more local, comparisons. This leads to a factoring of the evaluation metric that may prove useful for its further investigation.</Paragraph> <Paragraph position="4"> Thirdly, focussing on discovery procedures forces us to identify what the surface indications of the various constructions in the grammar are. Of course, this does not mean one should look for a one-to-one correspondence between individual grammar constructions and the surface data; but rather complexes of grammar constructions that interact to yield particular patterns on the surface. One is then investigating the logical implications of the existence of a particular constructions in the data.</Paragraph> <Paragraph position="5"> Following from the last point, 1 think a discovery procedure should have a deductive rather than enumerative structure. In particular, procedures that work essentially by enumerating all possible (sub)grammars and seeing which ones work are not only in general very inefficient, but. also not. very insightful. These discovery by enumeration procedures simply give us a list of all rule systems that are empirically adequate as a result, but they give us no idea as to what properties of these systems were crucial in their being empirically adequate.</Paragraph> <Paragraph position="6"> This is because the structure imposed on the problem by a simple recursive enumeration procedure is in general not related to the intrinsic structure of the rule discovery problem.</Paragraph> </Section> <Section position="5" start_page="0" end_page="344" type="metho"> <SectionTitle> 3. A PHONOLOGICAL RULE DISCOVERY PRO- CEDURE </SectionTitle> <Paragraph position="0"> Below and in Appendix A I outline a discovery procedure: which I have fully implemented in Franz Lisp on a VAX 11/750 computer, for a restricted class of phonological rules, namely rules of the type shown in (1).</Paragraph> <Paragraph position="2"> text Cin the input to the rule appears asa bin the rule's output. Context C is a feature matrix, and to say that a appears in context C means that C is a subse! of the fvature malrix formed by the segments around a 1. A phonological system consists of an ordered 2 set of such rules, where the rules are considered to apply in a cascaded fashion, that. is, the output of one rule is the input to the next..</Paragraph> <Paragraph position="3"> The problem the discovery procedure must solve is, given some data, to determine the set of rules. As an idealization, I assume that the input to the discovery procedure is a set of surface paradigms, a two dimensional array of words with all words in the same row possessing the same stem and all words in the same column the same affix. Moreover, l assume the root and suffix morphemes are already identified, ahhough I admit this task may be non-trivial.</Paragraph> </Section> <Section position="6" start_page="344" end_page="344" type="metho"> <SectionTitle> 4. DETERMINING THE CONTEXT THAT CONDI- TIONS AN ALTERNATION </SectionTitle> <Paragraph position="0"> Consider the simplest phonological system: one in which only one phonological rule is operative. In this system the alternating segements a and b can be determined by inspection, since a and b will be the only alternating segments in the data (although there will be a systematic ambiguity as to which is a and which is b). Thus a and b are strongly data determined.</Paragraph> <Paragraph position="1"> Given a and b. we can write a set of equations that the rule context C that conditions this alternation must obey.</Paragraph> <Paragraph position="2"> Our rule rnust apply in all contexts C b where a b appears that alternates with an a, since by hypothesis b was produced by this rule. We can represent this by equation (2).</Paragraph> <Paragraph position="3"> (2) ~7\]Cb, C matches C b The second condition that our rule must obey is that it doesn't apply in any context. C a where an a appears. If it did, of course, we would expect a b, not an a, in this position on the surface. We can write this condition by equation (3). (3) ~C/C,, C does not match 6', These two equations define the rule context C. Note that in general these equations do not yield a unique value for C; depending apon the data tbere may be no C that simultaneously satisfies (2) and (3). or there may be several different C that simultaneously satisfies (2) and (3). We cannot appeal further to the data to decide which C to use, since they all are equally consistent with the data.</Paragraph> <Paragraph position="4"> Let us call the set of C that simultaneously satisfies (2) and (3) S o Then S c is strongly data determined; in fact, there is an efficient algorithm for computing S c from the C,s and Cbs that does not involve enumerating and testing all imaginable C (the algorithm is described in Appendix A).</Paragraph> <Paragraph position="5"> However, if S c contains more than one 6', the choice of which C from Sc to actually use as the rule's context is weakly 1 What is crucial for what follows is that saying context C matches a portion of a word W is equivalent to saying that C is a subset of W. Since both rule contexts and words can be written as sets of features, 1 use &quot;contexts&quot; to refer both to rule contexts and to words.</Paragraph> <Paragraph position="6"> z I make this assumption as a first approximation. In fact, in real phonological systems phonological rules may be unordered with respect to each other.</Paragraph> <Paragraph position="7"> data determined. Moreover. the choice of v, hich ('from Sclo use does not affect any other decisions that the discovery procedure has to make - that is. nothing else in the complete grammar must change if we decide to use one C instead of another.</Paragraph> <Paragraph position="8"> Plausibly, the evaluation metric and universal principles decide which C to use in this situation. For example, if the alternation involves nasafization of a vowel, something that usually only occurs in the context, of a nasal, and one of the contexts in S c involves the feature nasal but the other C in S c do not, a reasonable requirement is that the discovery procedure should select the context involving the feature nasal as the appropriate context Cfor the rule.</Paragraph> <Paragraph position="9"> Another possibility is that .qc'S containing more than one, member indicates to the discovery procedure that it simply has too little data to determine the grammar, and it defers making a decision on which C to use until it has the relevant data. The decision as to which of these possibilities is correct is is not unimportant, and may have interesting empirical consequences regarding language acquisition.</Paragraph> <Paragraph position="10"> McCarthy (1981) gives some data on a related issue.</Paragraph> <Paragraph position="11"> Spanish does not tolerate word initial sC clusters, a fact. which might be accounted for in two ways; either with a rule that inserts e before word initial sC clusters, or by a constraint on well-formed underlying structures (a redundancy rule) barring word initial sC. McCarthy reports that either constraint is adequate to account for Spanish morphopbonemics, and there is no particular language internal evidence to prefer one over the other.</Paragraph> <Paragraph position="12"> The two accounts make differing predictions regarding the treatrnent of loan words. The e insertion rule predicts that loan words beginning with sC should receive an initial e (as they do: esnob, esmoking, esprey), while the well-formedness constraint makes no such prediction.</Paragraph> <Paragraph position="13"> McCarthy's evidence from Spanish therefore suggests that the human acquisition procedure can adopt one potential analysis and rejects an other without empirical evidence to distinguish between them. ltowever, in the Spanish case, the two potential analyses differ as to which components of the grammar they involve (active phonological processes versus lexical redundancy rules) which affects the overall structure of the adopted grammar to a much greater degree than the choice of one C from S c over another.</Paragraph> </Section> <Section position="7" start_page="344" end_page="346" type="metho"> <SectionTitle> 5. RULE ORDERING </SectionTitle> <Paragraph position="0"> In the last section 1 showed that a single phonological rule can be determined from the surface data. In practice, very few, if any, phonological systems involve only one rule.</Paragraph> <Paragraph position="1"> Systems involving more than one rule show complexity that single rule systems do not. In particular, a rules may be ordered in such a fashion that one rule affects segments that are part of the context that conditions the operation of another rule. If a rule's context is visible on the surface (ie.</Paragraph> <Paragraph position="2"> has not been destroyed by the operation of another rule) it is said to be transparent, while if a rule's context is no longer visible on the surface it is opaque. On the face of it, opaque contexts could pose problems for discovery procedures.</Paragraph> <Paragraph position="3"> ()r<h,rillg (,i r,lh,~ h~u- b<'(q+ a topic ~,ul>,,l+jlilial re.~e~-~r\[h it+ ?h..<,h,g',. Xl'. mai,, ,d,i,.cli'..c, i. thi- ~,,rti.. is t(, shov. that extrirlsically ordered ruh,s i,, prilu'iph' pose t~o prohlem for a discover) prl,tt'durl'. ('~l'n if later ruh's obscure Ihe ('ontext of earlier ones. I don't make any elaitn that Ihe procedure presented here is optinlal - in fact I can think of at least two ways to make it perform its job more effil'ienlly. The output of this (lisc<~very procedure is the set of all possible ordered ruh. s3stelllS z aud their correspondiHg u lderhing forms that can pr(,duee the given surface fort,is.</Paragraph> <Paragraph position="4"> As before. I ass,lnle thal the data is in the form of sets of paradigms. I also assunu, that for e~er) ruh, ctlanging an a to a b. an aheri,aiion hetween a and b appears in the data: thus ++e know hy listing the alternations in ttw data just what the possihle as and bs of the ruh' are 4.</Paragraph> <Paragraph position="5"> Frorn the assumpxion thai ruh,s are ex tins\[(ally ordered il folh,ws lhat one of the ruh's must have appli(,(t last: that is. there is a urJique &quot;most surfaev&quot; rule. The ('ontext or this ruh. +~ill ne<essariLy I,r t ransl)aret, t (visible in the surface hJrms), as there is ill) later rule to nlake its context opaque.</Paragraph> <Paragraph position="6"> Of coHrse, till' (liscover.', procedure has no a priori way of tellhJg +~hit'h alt(.rnati.n (.,rresponds In the nlost surfacy rule. ThlLy> although tilt, identh) of till' segnlelitS involved in tile niosl suffal&quot;, rule ilia)&quot; he strictly data delerlnined, at this stall, Ihls inftlrnlaliiln i.&quot;; Ill) |availahle to the discovery pro('edure. null SO at this point, tile discovery pr(lcedure proposed here systematically investigates all of the surface ahernations: fi)r each alternation it makes the hypothesis that h, is the the alternation (if lilt, nlost sllrfa(') rub'. ('herks that a context Call be fouud thai conditions this alternation (this lnust he so if the hypothesis is correct) using the sirigle rule algorithm presented earlier, and then investigates if it, is possible to construt( an empirically correct set of rules based on this hylitlt.hesis.</Paragraph> <Paragraph position="7"> Given thai we have found a potential IlIIIOSI surfacy&quot; ruh,, all of the surface alternates are replaced by the putative underlying segment to fornl a set of intermediate forms, in whi<'h the rule just discovered has been undone. We can undo this rule berause we previously identified tile alternating segnlents, ull),.rtantly, undoing this rule means that all other Thus if the n rules in the systetn are unoi'dered, this procedure returns n! solutions corresponding to the n ways of ordering these rules.</Paragraph> <Paragraph position="8"> The reason why the class of phonological rules considered in this paper was restricted to those mapping segments into segments was so that all alternations could be identified by simply comparing surface forms segment by segment. Thus in this discovery procedure the algorithm for identifying possible alternates can be of a particularly simple form. If we are willing It) complicate the rnachinery that deterlnines the possibh' ahernations in some data. we can relax the restriction prohibiting epe+nt, hesis and deletion rules, and the requirement that all alternations are visible on tile surface. That is, if the approach here is correct, the problem of identifying which segments alternate is a different problem to discovering the ((Ull|'~t llllll tl~hdllll~ll~, lhl ~ ,flit I hill ll,il, ruh.s whl)se cot, texts had been made opaque in the surface dala b.v the operation of the most surfacy rule will now be t ransparen t.</Paragraph> <Paragraph position="9"> The hypothesis tester proceeds to look for another alternation, this tilne in the intermediate forms, rather than in the surface fi)rms, and so on until all alternations have been accounted for.</Paragraph> <Paragraph position="10"> If at an.',' stage the hypothesis tester fails to find a rule I,o dr'scribe the alternation it is currently working with, that is, the single-rule algorithm determines thai no rule context exists that can capture this alternation, the hypothesis tester discards ttte current hypothesis, and tries auother.</Paragraph> <Paragraph position="11"> The hypothesis tester is responsible for proposing different rule order\[ass, which are tested by applying the rules in reverse to arrive at progressively more renloved representalions, with the single-ruh' algorithm being applied at each step to deterlnine if a rule exists that relates one level of intermediate representation with the next. We ran regard the hyp(itilesis tester as systematically searching through tile space of different rule orderings, seeking rub' orderings that successfully accounts for the ohserved data.</Paragraph> <Paragraph position="12"> q'tJe output of this procedure is therefore a list of all possible rule orderings. As \] tnentioned before, I think that tile etlumeratlve approacit adopted here is basically flawed. So althougit this procedure is relatively efficient, in situations where rule ordering is strictly data determined (that is, where only one nile ordering is consistent with the data), in situations where the rules are tmordered (any rule ordering will do), the procedure will generate all possible n! orderings of the n rules.</Paragraph> <Paragraph position="13"> This was most striking while working with some Japanese data. with 6 dislincl alternations, 4 of which were unordered with respect to each other. The discovery procedure, as presented above, required approximately 1 hour of CPU time to completely analyse this data: it. found <l different underlying forms and 512 different rule s.vstems that generate the Japanese data, differing primarily in tile ordering of the rules. This demonstrates that a discovery procedure that simply enumerates all possible rule ordering is failing to capture some inlportant insight regarding rule ordering, since unordered rules are much more difficult for this type of procedure to handle, yet, unordered rules are the most comtnon situation in natural langnage phonology.</Paragraph> <Paragraph position="14"> This problem may be traced back to the assumption made above that a phonological system consists of an ordered set of rules. The Japanese example shows that in many real phonological systems, the ordering of particular rules is simply not strongly data determined. What we need is some way of partitioning different, rule orderings into equivalence classes, as was done with this the different rule contexts in the single rule algorithm, and then compute with these equivalence classes rather than individual rule systems; that is. seek to localize the weak data determinacy.</Paragraph> <Paragraph position="15"> Looking at the problem in another way, we asked the discovery procedure to find all sets of ordered rules that generate the surface data, which it did. However, it seems that this simply was not rigllt question, since the answer to this question, a set of 512 different systems, is virtually uninterpretable by human beings. Part of the problem is lhat phonologists in general have not yet agreed what exactly the principles of rule ordering are s .</Paragraph> <Paragraph position="16"> Still, the present discovery procedure, whatever its deficiencies, does demonstrate that rule ordering in phonology does not pose any principled insurmountable problems for discovery procedures (although the procedure presented here is certainly practically lacking in certain situations), even if a later rule is allowed to disturb the context of an earlier rule, so that the rule's context is no longer &quot;surface true&quot;. None the less, it is an empirical question as to whether phonology is best described in terms of ordered interacting rules~ all that l have shown is that such systems are not in principle unlearnable.</Paragraph> </Section> class="xml-element"></Paper>