File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/w94-0206_metho.xml
Size: 36,346 bytes
Last Modified: 2025-10-06 14:13:53
<?xml version="1.0" standalone="yes"?> <Paper uid="W94-0206"> <Title>PARSING USING LINEARLY ORDERED PHONOLOGICAL RULES</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> PARSING USING LINEARLY ORDERED PHONOLOGICAL RULES </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> A generate and test algorithm is described which parses a surface form into one or more lexical entries using linearly ordered phonological rules. This algorithm avoids the exponential expansion of search space which a naive parsing algorithm would face by encoding into the form being parsed the ambiguities which arise during parsing. The algorithm has been implemented and tested on real language data, and its speed compares favorably with that of a KIMMO-type parser.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> I. INTRODUCTION </SectionTitle> <Paragraph position="0"> A generate and test algorithm is described which uses linearly ordered phonological rules to parse a surface form into one or more underlying (lexicai) forms.* Each step of the derivation may be rendered visible during both generation and test phases. The algorithm avoids an exponential expansion of search space during the generation phase by encoding the ambiguities which arise into the form being parsed. At the end of the generation phase, lexical lookup matches the ambiguous form against lexical entries. Because not all combinations of ambiguities in the parsed form are compatible, a test phase is used to filter * 1 have benefited from comments on previous versions of this paper by Alan Busernan, and several anonymous referees. Errors remain my own.</Paragraph> <Paragraph position="1"> forms found at lexical lookup. In this phase, the phonological rules are applied in forward order, and the derivations of any final forms which do not match the original input word are thrown out. The algorithm has been implemented and tested on real language data; its speed is comparable to that of a KIMMO-type parser.</Paragraph> </Section> <Section position="4" start_page="0" end_page="59" type="metho"> <SectionTitle> 2. THE PROBLEM </SectionTitle> <Paragraph position="0"> Since the publication of The 5bund Pattern of English (Chomsky and Halle 1968), most generative linguists have held that the phonological rules of natural languages are linearly ordered (Bromberger and Halle 1989).</Paragraph> <Paragraph position="1"> That is, when deriving a surface, form from an underlying (lexical) form, the input of the N+lth rule is the output of the Nth rule.</Paragraph> <Paragraph position="2"> While it is straightforward to derive a surface form from an underlying form with linearly ordered rules, complications arise in searching for the Icxical form(s) from which a given surface form may he dcrivcd. One difficulty is that phonological rules are oRen neutralizing, so the result of &quot;unapplying&quot; such a rule during parsing is ambiguous. Consider the following simple rule:</Paragraph> <Paragraph position="4"> Unapplieation of this devoicing rule to a noncontinuant voiceless segment presents a dilemma: should the underlying segmcnt be reconstructed as having been \[+voiced\], or was the segment originally \[-voiced\] (with the rule having applied vacuously)7 This dilemma arises under most theories with linearly ordered rules, whether segmental or autosegmentai.</Paragraph> <Paragraph position="5"> A second difficulty for parsing is that if rules apply in linear order, later rules can obscure the effects of earlier rules. In the example given, a later rule might alter the cnvironment in which the devoicing rule had applied, e.g. by voicing a scgment which served as the environment for the first rule.</Paragraph> <Paragraph position="6"> This sccond problem arises in any theoretical framework which allows opaque rule orderings, that is, rule orders in which a later rule can opacify (obscure) the effects of earlier rules. Theories which disallow opaque rule orders (such as Natural Generative Phonology, see Hooper (1975)) have not enjoyed lasting popularity among linguists.</Paragraph> <Paragraph position="7"> The implication of these two problems is that parsing would appear to require a bifurcation of the search space for each feature value assigned in the output of a phonological rule. For instance, consider the above devoicing rule, followed by a voicing rule which opacities the first rule. Suppose we have a surface sequence of a voiceless noncontinuant segment followed by a voiced segment. In parsing this sequence, it would seem that we must explore several paths. If the surface voiced segment were also underlyingly voiced (vacuous application of the voicing rule), then there is no fimher choice; the surface voiceless noncontinuant could not have been devoiced by the devoicing rule. But if the surface voiced segment were underlyingly voiceless (nonvacuous application of the voicing rule), then the first rule might have applied, either vacuously or nonvacuously. Given that languages may have tens of phonological rules, and that each rule may alter multiple features, the search space becomes enormous.</Paragraph> <Paragraph position="8"> Anderson (1988:5) summarizes the problem as follows: ...if thc phonology of the language involvcs a non trivial amount of neutralization.., it is nceessary to calculate all of the possible combinations of alternatives allowed by various rules, which may be individually large when neutralizations are involved and whose product grows exponentially as the amount of significant rule interaction (ordering) increases.</Paragraph> <Paragraph position="9"> The combinatorial possibilities involved in undoing the phonology thus get out of hand rather quickly. Since the depth of ordering in a linguistically motivated description can easily approach 15-20, with many of the rules involved being many-ways ambiguous when regarded from the &quot;wrong end,&quot; the approach of simply undoing the effects of the rules was soon seen to be quite impractical.</Paragraph> <Paragraph position="10"> But in fact this expansion of search space can be avoided by the use of a generate-and-test algorithm, in which the ambiguity resulting from the unapplication of each rule is encoded into the form when the rule is unapplied. The resulting algorithm turns out to be tractable for the sorts of rules and rule ordering which arise in natural languages.</Paragraph> </Section> <Section position="5" start_page="59" end_page="59" type="metho"> <SectionTitle> 3. THE GENERATE-AND-TEST ALGORITHM </SectionTitle> <Paragraph position="0"> This section presents an algorithm for parsing with linearly ordered rules. The algorithm is efficient for the sorts of rule sets that have been proposed by generative phonoiogists for natural languages.</Paragraph> <Paragraph position="1"> The algorithm is presented in general terms, abstracting away from implementational details where possible. Where a certain degree of concreteness is unavoidable--as in the definitions of the application or unapplieation of a single rule--alternative forms of the algorithm are mentioned.</Paragraph> </Section> <Section position="6" start_page="59" end_page="59" type="metho"> <SectionTitle> 3.1 DEFINITIONS AND INITIAL ASSUMPTIONS </SectionTitle> <Paragraph position="0"> An instantiated (phonetic:).feature is a feature-name plus an atomic feature value; an uninstantiated feature is merely the feature-name. A segment-specification consists of a character representation of some segment (one or more characters, e.g. &quot;k&quot; or &quot;oh&quot;), plus a set of features, not all of which need be instantiated. An alphabet consists of a set of segmentspecifications. A given language may employ more than one alphabet, distinguishing such as an input (surface) alphabet and a lexieal (underlying) alphabet.</Paragraph> <Paragraph position="1"> A (phonetic) word consists of a list of one or more segments, where each segment consists of a set Of features. Input words (words to be parsed) and lexieal words are usually represented instead in a character-based notation; the translation between this and a segment-based representation is defined below.</Paragraph> <Paragraph position="2"> A phonological rule consists of an input (left-hand) side, an output (right-hand) side, a left; environment, and a right environment. The input and output side each consist of a set of one or more instantiated features. (The extension to lists of sets, representing an input or output of more than a single segment, is straightforward.</Paragraph> <Paragraph position="3"> Rules in which the input or the output is empty, i.e. epenthesis or deletion rules, are discussed later.) The environments of a rule consist of a sequence of zero or more sets of instantiated features or optional sequences, together with a Boolean specification of whether the environment must begin (leR environment) or end (right environment) at a word boundary. An optional sequence consists of a sequence of oneor more sets of features, together with a minimum (MIN) and maximum (MAX) number of times the optional sequence may appear.</Paragraph> <Paragraph position="4"> Finally, the analysis target of a rule is defined (for a rule with input and output of length one) as a set of features, which set consists of the features of the output, together with any non-contradictory features of the input. (In most rules, the features of the input and output are disjoint, so that the target consists of the union of the input and output features. Occasionally a rule will speei~ one value of a feature in the input, and a contrary value in the output. In that case, the analysis target takes the value of the feature in the output.) A rule is said to be ~'elJ-'opaquing if it could be applied nonvacuously to a segment of its environments, l Such a rule must receive special treatment during analysis, because its application may have altered the word so that the output no longer meets the structural description of the rule.</Paragraph> <Paragraph position="5"> The list of rules of a language is linearly ordered, and given in synthesis order. That is, the input of the first rule is a word from the lexicon, the input, of: the second* ruleisthe. ~ . . output. * ..:. ' .. * . ':&quot;i .. of the first rule; ete.~ and the output of the last .. rule isa surface form.. '</Paragraph> </Section> <Section position="7" start_page="59" end_page="61" type="metho"> <SectionTitle> 3.2 TRANSLATION BETWEEN ALPHABETIC AND SEGMENTAL REPRESENTATIONS </SectionTitle> <Paragraph position="0"> A word in a phonetically based orthography (not, say, English orthography) may be translated into a segmental representation by the following algorithm: IT he precise formulation of &quot;self-opaquing&quot; for the purposes of the algorithm is somewhat more restrictive. Self-opaquing rules cause difficulty for parsing because such a rule may apply (nonvacuously) to some segment, while in the output the rule seems not to .have applied.to that segment because the environment for that segment has itself been altered by the rule so that it no longer meets the structural description: a self-counterbleeding rule. This can only happen if a segment of the environment meets the structural description of the rule, and the structural change of the rule assigns a value contrary to the value required in the environment. That is, a segment of the rule's environment is unifiable with the structural description of the rule but not with the structural change.</Paragraph> <Paragraph position="1"> If the rule applies left-to-right iteratively, only the right environment is relevant, as only that environment can be altered after it has been used. Likewise, if the rule applies right-to-leR iteratively, only the left enviromnent is relevant. If the rule applies simultaneously, both environments are relevant. &quot; ~ . . .: iiii Beginning at the left end of the word, replace the longest substring which corresponds to the character representation of some segment-specification in the appropriate alphabet, with its set of features.</Paragraph> <Paragraph position="2"> Continue left to right, replacing substrings of the word with their features until the right end of the word is reached. If the process fails at any point (because no substring corresponds to a segment-specification), fail.</Paragraph> <Paragraph position="3"> This translation algorithm is deterministic, and would give wrong results for a word like &quot;mishap&quot; (assuming &quot;sh&quot;, &quot;s&quot; and &quot;h&quot; to be defined as segment-specifications). The algorithm could easily be made nondeterministic, with the proviso that each translation of an input word would be subjected to the remainder of the parsing algorithm. However, how multiple translations of lexical words would be treated is not so clear.</Paragraph> <Paragraph position="4"> The translation between alphabetic and segmental representations could instead be done by a finite state transducer, with equivalent results.</Paragraph> </Section> <Section position="8" start_page="61" end_page="62" type="metho"> <SectionTitle> 3.3 UNAPPLICATION OF PHONOLOGICAL RULES </SectionTitle> <Paragraph position="0"> During the analysis phase of the algorithm, each rule is unapplied by uninstantiating in each segment which matches the rule in the correct environment, those features which the right-hand (output) side of the rule sets. For instance, if a rule assigns the value \[-voiced\] in its output, during parsing the value of the feature &quot;voiced&quot; in the segments affected by the rule becomes uninstantiated.</Paragraph> <Paragraph position="1"> More specifically, given an input (surface) .word in its segmental representation and a list of phonological rules, the rules may be unapplied to the word as follows.</Paragraph> <Paragraph position="2"> (1) Reverse the list of rules to give a list in analysis order.</Paragraph> <Paragraph position="3"> (2) Unapply the first rule of the list to the input word, using the algorithm below.</Paragraph> <Paragraph position="4"> (3) Unapply each succeeding rule to the output of the previous rule.</Paragraph> <Paragraph position="5"> The algorithm for the unapplication of a single rule in left-to-fight iterative fashion (see Kenstowicz and Kissebeah 1979) is as follows; note that during analysis, a left-to-right iterative rule is applied right-to-left.</Paragraph> <Paragraph position="6"> For each segment S beginning at the right end of the word: If S is unifiable with the analysis target of the rule, and the left-hand environment of the rule matches against the word ending with the segment to the left of S, and the right-hand environment of the rule matches against the part of the word beginning with the segment to the right of S, then uuinstantiate the features of S whose feature-names are contained in the output of the rule.</Paragraph> <Paragraph position="7"> An environment sequence matches a subsequence of segments during analysis if: For each member of the environment which is a set of features, that set unifies with the corresponding segment of the word; else (if the member is an optional sequence), the optional sequence matches against the corresponding sequence of segments between MIN and MAX number of times If the environment must mater at the margin of a word, then when the enviromnent sequence is used up, the last segment matched must be the first segment of the word for the left environment, or the last segment for the right environment.</Paragraph> <Paragraph position="8"> After a rule has been unapplied to a word, if the rule is self-opaquing and the unapplication was nonvacuous, the rule is unapplied again until its unapplication is vacuous.</Paragraph> <Paragraph position="9"> The unapplication of a rule which applies right-to-left iteratively is the obvious transformation of the above algorithm.</Paragraph> <Paragraph position="10"> The important point in the unapplication of a single rule to a form is the use of unification, so that a segment in the word matches a feature set in the rule even if the value of one or more relevant features in the segment has been uninstantiated by the unapplication of a previous rule. Matching against an uninstantiated feature thus represents an assumption, that the underlying value of that feature was correct. This assumption can only be validated during the synthesis phases when a lexical entry from the lexicon will have become available.</Paragraph> <Paragraph position="11"> The unapplication of a rule which applies simultaneously to its input may be performed by either left-to-right or right-to-left iterative unapplication, although the unapplication may need to be repeated if the rule is self-opaquing. To see why the self-opaquing test might be necessary, consider the following hypothetical rule:</Paragraph> <Paragraph position="13"> When applied simultaneously to the form apkpa, the result is afxpa. If the rule were unapplied to afxpa left-to-right iteratively, after the first pass we would have af\[x klpa, where the sequence Ix k\] is intended to represent a voiceless velar obstruent with an uninstantiated value for the fcature \[continuantl (hence ambiguous bctween the fricative x and the stop k). Only after a second pass would we get a\[.fPllx k\]pa. (In this example the rule could have been unapplied right-to-left iteratively in a single pass, but a single right-to-left iterative application would have given the wrong result with the mirror image of the given rule.) As an alternative to the above algorithm, the unapplication of a single rule could be performed by a Finite State Transducer (FST) (Johnson 1972, cf. also Kaplan and Kay, in press). It will be more convenient to compare the FST method with the above algorithm when we considcr the application of a rule (as opposed to its unapplication).</Paragraph> <Section position="1" start_page="62" end_page="62" type="sub_section"> <SectionTitle> 3.4 LEXICAL LOOKUP </SectionTitle> <Paragraph position="0"> A word, some of whose segments may be partially instantiated, matches against a word in the lexicon if the features of each of its segments are unifiable with the corresponding segment of the iexical word. Lexical lookup consists of finding all such matches.</Paragraph> <Paragraph position="1"> The unapplication of the phonological roles and the process of lexical lookup constitute the analysis phase of the algorithm.</Paragraph> </Section> </Section> <Section position="9" start_page="62" end_page="66" type="metho"> <SectionTitle> 3.5 APPLICATION OF PHONOLOGICAL RULES </SectionTitle> <Paragraph position="0"> As a result of the unapplication of rules to forms some of whose features may have been uninstantiated by earlier rules, some overgeneration may result, because a form taken from the lexicon may not have the value which was assumed during analysis: This overgeneration is filtered out by applying the rules in a synthesis phase. Thedegree of overgeneration is small, for reasons discussed in Maxwell (1991). The algorithm for applying rules during synthesis is straightforward: Given a lexical word and the list of rules, the first rule is applied to the lexicai word, the second rule is applied to the output of the first, etc.</Paragraph> <Paragraph position="1"> The application of a single rule in lefbto-right iterative fashion is as follows: For each segment S beginning at the left end of the word: If S contains all the features of the left-hand side of the rule, and the left and right environments match parts of the word immediately to the left and right of S, then set the value of each feature in S whose name appears in the output of the rule tO the value in that output.</Paragraph> <Paragraph position="2"> An environment sequence matches during synthesis if: For each member of the environment which is a set of features, the corresponding segment of the word contains those same features; else (if the member is an optional sequence), the optional sequence matches against the corresponding segments of the word between MIN and MAX number of times. The condition on matching a word boundary is the same as during unapplication.</Paragraph> <Paragraph position="3"> Right-to-left iterative application is again the obvious transformation of this algorithm. Simultaneous application may be modeled by ..'Z : .. first collecting the set of all segments which satisfy the structural description of the rule, and then applying the output of the rule to each segment in that set.</Paragraph> <Paragraph position="4"> There is no need to check for possible reapplieation of a rule during synthesis, as there was during analysis. This is because if the application of a rule creates new environments to which it might apply, those environments do not serve as fiarther input for the rule apart from iteration or cyclic application. Directional iterative application is handled directly by the above algorithm, while nondirectional iterative application has generally been rejected by phonologists (cf. Johnson 1972: 35ff., and for a slightly different form of nondirectional iterative application, Kcnstowicz and Kisseberth, 1979: 325). Cyclic application is not treated under the above algorithm, but would constitute only a restricted form of reapplication in which the application of a set of phonological rules would be sandwiched between each pair of cyclic morphological rules (as argued originally by Pesetsky 1979). If two or more cyclic morphological rules applied in a given word, the cyclic phonological rules would also apply at least twice. But each such application would be separated by the application of other rules, both phonological and morphological.</Paragraph> <Paragraph position="5"> ! will refer to this algorithm for applying a single rule as the Target-First Application Algorithm, or TFAA; it is analogous to the algorithm given earlier for unapplication of a rule.</Paragraph> <Paragraph position="6"> As an alternative to the TFAA, each rule could instead be applied by an FST.</Paragraph> <Paragraph position="7"> A disadvantage of application of a rule by the TFAA, compared with its application by FST, is that when checking the left-hand environment (assuming the rule applies leg-toright iteratively), the TFAA must retest segments it has already considered as possible target segments. In other words, the TFAA backs up through the form when checking the lett-hand environment. Under those same circumstances, the FST nccd do no backing tip when checking the left environment, as the applicability of the left environment is already determined when the FST arrives at a potential target. The distance the TFAA backs up can be considerable, in particular when the left environment (or the right environment, for a right-to-let~ iterative rule) has optional sequences (so that backtracking must be employed in case of failure to match the environment on the initial check), or when the word being parsed has &quot;optional&quot; segments. (Optional segments arise in analysis during the unapplication of deletion rules, as discussed later.) Both the FST and the TFAA may test the same segments multiple times when the right-hand environment is nonempty (assuming Ictt-toright iterative application). For the FST, this will only happen if it made an incorrect choice. An example would be the rule:</Paragraph> <Paragraph position="9"> when applied to the form ba. After the FST tests the target, it could attempt to apply the rule by assigning the feature \[-voiced\] to the b (changing it to p). This would be incorrect, however, as the FST discovers when it processes the \[+voicedl segment a; it must therefore back up, restore the \[+voicedl value to the b, and move right to process the a again.</Paragraph> <Paragraph position="10"> The TFAA, applying the same rule to the same form, would first notice the potential target b. Before altering the value of the feature \[voiced\], however, it would check the right environment: the segment a. Noticing that it does not satisfy the requirement that the right environment be \[-voicedl, it refrains from altering the feature \[voicedl on the b. it then goes on to check whether the a constitutes a potential target.</Paragraph> <Paragraph position="11"> However, the real question is not the worst case behavior, but the average case behavior; how many comparisons must be done for the average word with the average rule? Unfortunately, this is not a straightforward question. Examples are readily constructed in which the FST would do more COluparisous thn.n the TFAA. Given that m solnc cases tile TFAA must back up through segments it has already considered while the FST need not, while in other' cases the FST does more comparisons than the TFAA, I leave the question of average ease behavior open. Note that similar considerations pertain to the behavior of the algorithm given earlier for the unapplieation of rules.</Paragraph> <Paragraph position="12"> A potential advantage of the TFAA over an FST implementation concerns the debugging of a single rule. When scanning a word for possible rule applications, people often search first for segments matching the input side of the rule, then cheek whether the left and fight environments of potential targets also match.</Paragraph> <Paragraph position="13"> This is essentially the method employed in the TFAA. If a rule is at all complicated, trying to. apply it as an FST instead becomes quite difficult for humans. By the same token, determining why a parser did or did not apply a rule to a certain segment of a form should be much easier if the parser presents: a trace of its application in the same form that the human would do it. This is of course only an advantage of the TFAA if the user is actually tracing a given rule. Indeed the parser need not use the same algorithm to apply a rule when debugging is turned on as it uses when debugging is not turned on (although it is certainly easier on the writer of the parser if it does).</Paragraph> <Section position="1" start_page="66" end_page="66" type="sub_section"> <SectionTitle> 3.6 COMPARISON WITH INPUT FORM </SectionTitle> <Paragraph position="0"> Returning to the overall algorithm, specifically the test phase: the derivation of a word to which all the rules have been applied is correct if the derived word matches the original input word, that is, if each .segment of the two words correspond. A segment corresponds if each of its features is identical.</Paragraph> <Paragraph position="1"> During the test phase of the algorithm, a derived word may fail to match against the original (input)' word under two circumstances: either one or more pairs of rules are opaquely ordered (see Maxwell 1991), or one or more rules are dcpendent on nonphonetic information, such as the location of a morpheme boundary or nonphonetic features. The resulting (potential) overgeneration is the reason for the test phase of the generate-and-test algorithm.</Paragraph> <Paragraph position="2"> This completes the discussion of the generate-and-test algo~rithm for feature-changing rules. The next two sections discuss some refinements.</Paragraph> </Section> <Section position="2" start_page="66" end_page="66" type="sub_section"> <SectionTitle> 3.7 EPENTHESIS AND DELETION RULES </SectionTitle> <Paragraph position="0"> During analysis, a segment which has been inserted by an epenthesis rule 2 must be unepenthesized, while segments which may have been deleted must be re-inserted. To avoid bifiarcation of the search for each such segment, segments may be assigned an additional feature called &quot;optional.&quot; All segments in the input word are marked \[-optional I. When an epenthesis rule is unapplied (using an algorithm similar to that given above for feature-changing rules), the segments which might be epenthetic are marked as \[+optional\]. Similarly, a deletion rule may be unapplied by inserting a new segment with the set of features specified on the input side of the rule, and marking that segment as \[+optional\].</Paragraph> <Paragraph position="1"> The unapplication of deletion rules must be ~aher constrained to prevent infinite looping.</Paragraph> <Paragraph position="2"> To take a concrete example, consider the following consonant cluster simplification rule:</Paragraph> <Paragraph position="4"> If this rule is un-applied to a surface form with a two consonant cluster, the result will be an intermediate form having a three consonant cluster. But the rule is Self-opaquing, in the sense that it can dclete consonants which form part Of the environment. Hence during analysis, it-should be allowed to re-unapply to its own output. But ifthe rule is allowed to un-apply to the intermediate form produced by its first unapplieation, namely a three consonant cluster, it can un-apply in two places to yield a fiveconsonant cluster; to which the rule can again be unapplied, ad infinitum.</Paragraph> <Paragraph position="5"> 2 Pretheoretically, an .epenthesis rule is a phonological rule which inserts a segment into a word. An example might be the insertion of p into warm-~th to give \[warmO\].</Paragraph> <Paragraph position="6"> The best solution to this problem would be to use reasoning to determine the maximum number of contiguous consonants which could appear in the input to the rule. But this is by no means simple. It would be straightforward to determine the maximum number of consonants which could appear in underlying forms (based on the maximum number of consonants which appear in lexical entries and in affixes, assuming a morphological component), and in fact the lexicon itself is often used for this purpose in KIMMO-based systems. However, with linearly ordered rules the number of adjacent consonants could in principle be increased by the application of certain rules preceding the deletion rule, including rules epenthesizing consonants, rules deleting vowels, and rules changing vowels into consonants. Whether such rules in fact exist, or whether they exist but would be blocked by other principles from creating inputs to such a consonant cluster simplification rule is an area of research in phonology.</Paragraph> <Paragraph position="7"> In the absence of a principled way of determining the maximum number of consonants that could appear in a cluster (or analogous limits on other deletion rules), an ad hoe limit may be placed on the application of deletion rules. One such limit is to unapply a deletion rule simultaneously, and only once (or only N times). To take a concrete example, consider the input abbabba, where a is a vowel and b is a consonant. A single simultaneous unapplication of the above consonant cluster simplification rule would give abCbabCba, while two unapplications would give abCCCbabCCCa, where the first and third Cs in each cluster result from the second unapplication. Limiting the unapplication of deletion rules in this way is ad hoe, but probably sufficient for practical purposes.</Paragraph> <Paragraph position="8"> The presence of l+optional\] segments arising from the unapplication of epcnthesis and deletion rules slightly complicates the algorithm given earlier for rule unappl!cation, in that such segments may optionally be passed over when checking rule environments.</Paragraph> <Paragraph position="9"> During synthesis, epenthesis rules are straightforwardly applied by inserting a segment with the features of the output of the rule, while deletion rules are applied by simply deleting the relevant segments.</Paragraph> </Section> </Section> <Section position="10" start_page="66" end_page="66" type="metho"> <SectionTitle> 3.8 NONPHONETIC FEATURES, BOUNDARY MARKERS, ALPHA FEATURES ETC. </SectionTitle> <Paragraph position="0"> Nonphonetic (diacritic) features and obligatory boundary markers in rules may simply be ignored during analysis, leading to some overgeneration, In (manually) checking a number of such rules against large dictionaries, overgeneration appears to be surprisingly small, in fact virtually nil.</Paragraph> <Paragraph position="1"> Alpha variable features (commonly used in assimilation rules) may be modeled by the use of variables which become instantiated to the value of features in the appropriate segments, so that checking for a match during analysis is a matter of unification. During synthesis, a variable in the output of a rule results in the features of the corresponding segment of the word being set to the value to which the variable becomes instantiated in some other part of the rule.</Paragraph> </Section> <Section position="11" start_page="66" end_page="66" type="metho"> <SectionTitle> 4. AN IMPLEMENTATION OF THE ALGORITHM </SectionTitle> <Paragraph position="0"> The generate-and-test algorithm has been implemented, as a parser which uses phonological rules of classical generative phonology, resembling those of Chomsky and Halle (1968) and much related work. (A sample rule is shown in the appendix.) ! call the parser &quot;Hermit Crab.&quot; There is provision for feature-changing rules (including alpha variable rules), epenthesis rules, and deletion rules. Disjunctive rule ordering may be modeled, as well as simultaneous or directional iterative application. The environments of rules may incorporate optional sequences (such as (CV)~).</Paragraph> <Paragraph position="1"> PC-KIMMO, ml implementation of two-level phonology (Antworth 1990) was used to provide a comparison between parsing with linearly ordered generative phonological rules, and with two-level rules. Both PC-KIMMO and Hermit Crab run under MS-DOS.</Paragraph> <Paragraph position="2"> PC-KIMMO comes with example analyses of the phonologies of several languages, including Hebrew, Turkish, Japanese, and Finnish, each analysis containing from 16 to 27 two-level rules. The PC-KIMMO analyses were converted into analyses using linearly ordered generative rules, which were equivalent in the sense that they derived the surface forms from the same underlying forms. In most cases the linearly ordered roles were simpler than the two-level rules, in part because rule ordering rendered redundant some of the constraints necessitated by the two-level formalism. The number of rules for each language was reduced to between 7 and 11, as some two-level rules (such as default rules) are unneeded in a generative analysis, while others collapse into disjunctively ordered rule sets. For instance, PC-KIMMO has six rules for vowel harmony in Turkish: two for backness harmony in low vowels (one to make a low vowel I+back\] in the appropriate environment, and one to make it i-baekl in the opposite environment), and four rules for backncss and rounding harmony in nonlow vowels. These collapse into two generative rules: one for baekness harmony, which affects all vowels, and uses an alpha variable for the two possible values of the feature back; and one rule for rounding harmony, which affects nonlow vowels, again using an alpha variable for the two possible values of the feature round.</Paragraph> <Paragraph position="3"> Because the focus here is on phonological parsing, rather than morphological parsing, the morphological rules given in PC-KIMMO's sample analyses were ignored, and fully affixed forms were used for underlying forms, e.g.: <lex_entry shape &quot;oda+sH&quot; ... gloss &quot;room+POSS&quot;> In a sample of several hundred words, PC-KIMMO was about three times faster than the parser using linearly ordered rules. This difference is not large, and indeed may be attributed in part to the different programming languages used (PC-KIMMO is written in C, while the parser implementing the generate-and-test algorithm is written in Prolog and C). The ratio of 3:1 is approximately constant among the four grammars, and independent of word length, indicating that the results should scale.</Paragraph> </Section> class="xml-element"></Paper>