XML Viewer - w98-1301

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-1301_metho.xml
Size: 19,586 bytes
Last Modified: 2025-10-06 14:15:14
<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-1301">
  <Title>The Proper Treatment of Optimality in Computational Phonology</Title>
  <Section position="3" start_page="3" end_page="3" type="metho">
    <SectionTitle>
3 Optimality theory
</SectionTitle>
    <Paragraph position="0"> Optimality theory (Prince and Smolensky \[24\]) abandons rewrite rules. Rules are replaced by two new concepts: (1) a universal function called GEN and (2) a set of ranked universal constraints. GEN provides each input form with a (possibly infinite) set of output candidates. The constraintseliminate all but the best output candidate. Because many constraints are in conflict, it may be impossible for any candidate to satisfy all of them.</Paragraph>
    <Paragraph position="1"> The winner is determined by taking into consideration the language-specific ranking of the constraints. The winning candidate is the one with the least serious violations.</Paragraph>
    <Paragraph position="2"> In order to explore the computational ~pects of the theory it is useful to focus on a concrete example, even simpler than the Yolmts vowel alternation we just discussed. 2 We will take the familiar case of syllabification constraints discussed by Prince and Smolensky \[24\] and many subsequent authors (Ellison \[6\], Tesar \[25\], Hammond \[8\]).</Paragraph>
    <Section position="1" start_page="3" end_page="3" type="sub_section">
      <SectionTitle>
3.1 GEN for syllabification
</SectionTitle>
      <Paragraph position="0"> We assume that the input to OEN consists of strings of vowels V and consonants C. GEN allows each segment to play a role in the syllable or to remain &amp;quot;unparsed&amp;quot;. A syllable contains at least a nucleus and possibly an onset and a coda.</Paragraph>
      <Paragraph position="1"> Let us assume that GEN marks these roles by inserting labeled brackets around each input element. An input consonant such as b will have three outputs 0\[b\] (onset), D\[b\] (coda), and X\[b\] (~mparsed). Each vowel such as a will have two outputs, N\[a\] (nucleus) and x \[a\] (unparsed), In addition, GEN &amp;quot;overparses&amp;quot;, that is, it freely inserts empty onset 0 \[ \], nucleus N \[ \], and coda D I&amp;quot; \] brackets.</Paragraph>
      <Paragraph position="2"> For the sake of concreteness, we give here an explicit definition of QEN using the notation of the Xerox regular expression calculus (Karttunen. et al \[15\]). We define GEN as the composition of four simple components, Input, Parse, 0verParse~ and $yllableStructure.</Paragraph>
      <Paragraph position="3"> The definitions of the first three components are shown in Figure 4.</Paragraph>
      <Paragraph position="4">  because rounding depends on the height of the stem vowel in the underlying representation. Cole and Kisseberth offer a baroque version of the two-level solution. McCarthy strives mightily to distinguish his &amp;quot;sympathy&amp;quot; candidates from the intermediate representations postulated by the rewrite approach.</Paragraph>
      <Paragraph position="6"> A replace expression of the type A -&gt; B ... C in the Xerox calculus denotes a relation that wraps the prefix strings in B and the sutF~ strings in C around every string in A.</Paragraph>
      <Paragraph position="7"> Thus Parse is a transducer that inserts appropriate bracket pairs around input segments.</Paragraph>
      <Paragraph position="8"> Consonants can be onsets, codas, or be ignored. Vowels can be nuclei or be ignored.</Paragraph>
      <Paragraph position="9"> 0verParse inserts optionally unfilled onsets, codas, and nuclei. The dotted brackets \[.</Paragraph>
      <Paragraph position="10"> * \] specify that only a single instance of a given bracket pair is inserted at any position.</Paragraph>
      <Paragraph position="11"> The role of the third GEN component, SyllableStructure, is to constrain the output of Parse and 0verParse. A syllable needs a nucleus, onsets and codas are optional; they must be ~ the right order; unparsed elements may occur freely. For the sake of clarity, we define SyllableStructure with the help of four auxiliary terms (Figure 5).</Paragraph>
      <Paragraph position="12">  Round parentheses in the Xerox regular expression notation indicate optionality. Thus (C) in the definition of Onset indicates that onsets may be empty or filled with a consonant. Similarly, (Onset) in the definition of SyllableStructture means that a syllable may have or not have an onset. The effect of the / operator is to allow unparsed consonants and vowels to occur freely within a syllable. The disjunction \[CJ V\] in the definition of Unparsed allows consonants and vowels to remain unparsed.</Paragraph>
      <Paragraph position="13"> With these preliminaries we can now define GEN as a simple composition of the four components (Figure 6).</Paragraph>
      <Paragraph position="14">  With the appropriate definitions for C (consonants) and V (vowels), the expression in Figure 6 yields a transducer with 22 states and 229 arcs.</Paragraph>
      <Paragraph position="15"> It is not necessary to include Input in the definition of GEN but it has technically a beneficial effect. The constraints have less work to do when it is made explicit that the auxih'ary bracket alphabet is not included in the input.</Paragraph>
      <Paragraph position="16"> Because QEN over- and underparses with wild abandon, it produces a large number of output candidates even for very short inputs. For example, the string a composed with tEN yields a relation with 14 strings on the output side (Figure 7).</Paragraph>
      <Paragraph position="17">  The number of output candidates for abracadabra is nearly 1.7 million, although the network representing the mapping has only 193 states, It is evident that working with finite-state tools has a significant advantage over manual tableau methods.</Paragraph>
    </Section>
    <Section position="2" start_page="3" end_page="3" type="sub_section">
      <SectionTitle>
3.2 Syllabification constraints
</SectionTitle>
      <Paragraph position="0"> The syllabification constraints of Prince and Smoleusky \[24\] can easily be expressed as regular expressions in the Xerox calculus. Figure 8 lists the five constraints with their translations.</Paragraph>
      <Paragraph position="1">  The definition of the llave0ns constraint uses the restriction operator =&gt;. It requires that any occurrence of the nucleus bracket, IN, must be immediately preceded by a filled 0\[C\] or unfilled 0\[ \] onset. The definitions of the other four constraints are composed of the negation&amp;quot; and the contains operator $. For example, the NoCoda constraint, &amp;quot;$&amp;quot;D\[&amp;quot;, can be read as &amp;quot;does not contain D~. The FillNuC/ and Fill0ns constraints forbid empty nucleus S\[ \] and onset 0\[ \] brackets.</Paragraph>
      <Paragraph position="2"> These constraints compile into very small networks, the largest one, Have0ns, contains four states. Each constraint network encodes an infinite regular language. For example, the ilaveOns language includes all strings of any length that contain no instances of N\[ at all and all strings of any length in which every instance of N \[ is immediately preceded by an onset.</Paragraph>
      <Paragraph position="3"> The identity relations on these constraint languages can be thought of as filters. For example, the identity relation on ilave0nz maps all llave0ns strings into themselves and blocks on all other strings. In the following section, we will in fact consistently treat the constraint networks as representing identity relations.</Paragraph>
    </Section>
    <Section position="3" start_page="3" end_page="3" type="sub_section">
      <SectionTitle>
3.3 Constraint application
</SectionTitle>
      <Paragraph position="0"> Having defined GEN and the five syllabification constraints we are now in a position to address the main issue: houl are optimality constraints applied ~.</Paragraph>
      <Paragraph position="1"> Given that Q~.N denotes a relation and that the constraints can be thought of as identity relations on sets, the simplest idea is to proceed in the same way as with the rewrite rules in Figure 2. We could compose GEN with the constraints to yield a transducer that maps each input to its most optimal realization letting the ordering of the constraints in the cascade implement their ranking (Figure 9).</Paragraph>
      <Paragraph position="2">  But it is immediately obvious that composition does not work here as intended. The 6-state transducer illustrated in Figure 9 works fine on inputs such as panama yielding 0\[p\]N\[a\]0\[~S\[a\]0\[m\]N\[a\] but it fails to produce any output on inputs like america that fall on some constraint. Only strings that have a perfect output candidate survive this merciless cascade. We need to replace composition with some new operation to make this schema work correctly.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="3" end_page="3" type="metho">
    <SectionTitle>
4 Lenient composition
</SectionTitle>
    <Paragraph position="0"> The necessary operation, let us call it lenient composition, is not di~cuLt to construct, but to our knowledge it has not previously been defined. Frank and Satta \[7\] come very close but do not take the final step to encapsulate the notion. Hammond \[8\] has the idea but lacks the means to spell it out in formal terms.</Paragraph>
    <Paragraph position="1"> As the first step toward defining lenient composition, let us review an old notion called priority union (Kaplan \[12\]). This term was originally defined as an operation for unifying two feature structures in a way that eliminates any risk of failure by stipulating that one of the two has priority in case of a conflict. 3 A finite-state version of this notion has proved very useful in the management of transducer lexicons (Kaplan and Newman \[11\]).</Paragraph>
    <Paragraph position="2"> Let us consider the relations q and R depicted in Figure 10. The Q relation maps a to z and b to y. The It relation maps b to z and c to z,. The priority union of Q and It, denoted Q .P. R, maps a to z, b to y, and c to w. That is, it includes all the pairs from Q and every pair from R that has as its upper element a string that does not occur as the upper string of any pair in Q. If some string occurs as the upper element of some pair in both Q and R, the priority union of Q and R only includes the pair in Q. Consequently Q .P. It in Figure 10 maps b to y instead of z.</Paragraph>
  </Section>
  <Section position="5" start_page="3" end_page="3" type="metho">
    <SectionTitle>
3 The DPATR system at SRI (Karttunen \[16\]) had the same operation with a less respectable title, it
</SectionTitle>
    <Paragraph position="0"/>
    <Paragraph position="2"> The .u operator in Figure 11 extracts the '~pper&amp;quot; language from a regular relation. Thus the expression &amp;quot;\[Q. u\] denotes the set of strings that do not occur on the upper side of the Q relation. The effect of the composition in Figure 11 is to restrict R to mappings that concern strings that are not mapped to anything in Q. Only this subset of R is unioned with Q.</Paragraph>
    <Paragraph position="3"> We define the desired operation, lenient composition, denoted . 0., as a combination of ordinary composition and priority union (Figure 12).</Paragraph>
    <Paragraph position="4"> R .0. C = \[R .o. C\] .P. It Figure 12. Definition of lenient composition To better visualize the effect of the operation defined in Figure 12 one may think of the relation R as a set of mappings induced by GEN and the relation C as oneof the constraints defined in Figure 8. The left side of the priority union, \[It . o. C\] restricts tt to mappings that satisfy the constraint. That is, any pair whose lower side string is not in C will be eliminated. If some string in the upper language of R has no counterpart on the lower side that meets the constraint, then it is not present in \[1l .o. C\] .u but, for that very reason, it will be &amp;quot;rescued&amp;quot; by the priority union. In other words, if an underlying form has some output that can meet the given constraint, lenient composition enforces the constraint. If an underlying form has no output candidates that meet the constraint, then the underlying form and all its outputs are retained. The definition of lenient composition entails that the upper language of It is preserved in R . 0. C.</Paragraph>
    <Paragraph position="5"> Many people, including Hammond \[8\] and Frank and Satta \[7\], have independently had a similar idea without conceiving it as a finite-state operation. 4 If one already knows about priority union, lenient composition is an obvious idea.</Paragraph>
    <Paragraph position="6"> Let us illustrate the effect of lenient composition starting with the example in Figure ? The composition of the input a with GSl~ yields a relation that maps a to the 14 outputs in Figure 7. We will leniently compose this relation with each of the constraints in the order of their ranking, starting with the ltave0ns constraint (Figure 13). The lower-case operator, o. stands for ordinary composition, the upper case. 0. for lenient composition. As Figure 13 illustrates, applying ltave0ns by lenient composition removes most of the 14 output candidates produced by OEN. The resulting relation maps a to two outputs 0\[ \]N\[a\] and 0\[ \]N\[a\]D\[ \]. The next highest-ranking constraint, NoCoda, removes the latter alternative. The twelve candidates that were eliminated by the first lenient composition are no longer under consideration.</Paragraph>
    <Paragraph position="7"> 4 Hammond implements a pruning operation that removes uutput candidates under the condition that &amp;quot;pruning cannot reduce the candidate set to null&amp;quot; (p 13). Frank and Satta (p. ?) describe a process of &amp;quot;conditional intersection&amp;quot; that enforces a constraint if it can be met and does nothing otherwise.  The next two constraints in the sequence, FillNuc and Parse, obviously do not change the relation because the one remaining output candidate, 0 \[ IN\[a\], satisfies them. Up to this point, the distinction between lenient and ordinary composition does not make any difference because we have not exhausted the set of output candidates. However, when we bring in the last constraint, FillOns, the fight half of the definition in Figure 12 has to come to the rescue; otherwise there would be no output for a.</Paragraph>
    <Paragraph position="8"> This example demonstrates that the application of optimality constraints can be thought of as a cascade of lenient compositions that carry down an ever decreasing number of output candidates without allowing the set to become empty. Instead of intermediate representations (c.f. Figure 1) there are intermediate candidate populations corresponding to the columns in the left-to-right ordering of the constraint tableau.</Paragraph>
    <Paragraph position="9"> Instead of applying the constraints one by one to the output provided by GEN for a particular input, we may also leniently compose the GEN relation itself with the constraints. Thus the suggestion made in Figure 9 is (nearly) correct after all, provided that we replace ordinary composition with lenient composition (Figure 14).</Paragraph>
    <Paragraph position="10">  The composite single transducer shown in Figure 14 maps a and any other input directly into its viable outputs without ever producing any failing candidates.</Paragraph>
  </Section>
  <Section position="6" start_page="3" end_page="10" type="metho">
    <SectionTitle>
5 Multiple violations
</SectionTitle>
    <Paragraph position="0"/>
    <Paragraph position="2"> However, we have not yet addressed one very important issue. It is not sufficient to obey the ranking of the constraints. If two or more output candidates &amp;quot;violate the same constraint multiple times we should prefer the candidate or candidates with the smallest number of violations. This does not come for free. The system that we have sketched so far does not make that distinction. If the input form has no perfect outputs, we may get a set of outputs that di~er with respect to the number of constraint violations. For example, the transducer in Figure 14 gives three outputs for the string bebop (Figure 15).</Paragraph>
    <Paragraph position="3">  Because bebop has no output that meets the Parse constraint, lenient composition allows all outputs that contain a Parse violation regardless of the number of violations. Here the second alternative with just one violation should win but it does not.</Paragraph>
    <Paragraph position="4"> Instead of viewing Parse as a single constraint, we need to reconstruct it as a series of ever more relaxed parse constraints. The &amp;quot;&gt;n operator in Figure 16 means &amp;quot;more than n iterations&amp;quot;.</Paragraph>
    <Paragraph position="5">  Our original Parse constraint is violated by a single unparsed element. Parse1 allows one unparsed element. Parse2 allows up to two violations, and Parseg up to N violations. The single Parse line in Figure 14 must be replaced by the sequence of lenient compositions in Figure 17 up to some chosen N.</Paragraph>
    <Paragraph position="6">  FiKure 17. Gradient Parse constraint If an input string has at least one output form that meets the Parse constraint (no violations), all the competing output forms with Parse violations are eliminated. Failing that, if the input string has at least one output form with just one violation, all the outputs with more violations are eliminated. And so on.</Paragraph>
    <Paragraph position="7"> The particular order in which the individual parse constraints apply actually has no effect here on the final outcome because the constraint languages are in a strict subset relation:  violations, it is in Parse2 and in all the weaker constraints. The ranking in Figure 17 determines only the order in which the losing candidates are eliminated. If we start with the strictest constraint, all the losers are eliminated at once when Parse2 is applied; if we start with a weaker constraint, some output candidates will be eliminated earlier than others but the winner remains the same.</Paragraph>
    <Paragraph position="8"> As the number of constraints goes up, so does the size of the combined constraint network in Figure 14, from 66 states (no Parse violations) to 248 (at most five violations). It maps bebop to 0\[bJSCe\]0\[b\]NCoJX\[p\] and abracadabra to 0DN\[edX\[bJ0CrJNCa\]0\[c\]N\[a\]0 \[d\]N \[aJ X \[b\] 0 It\] N \[a\] correctly and instantaneously. It is immediately evident that while we can construct a cascade of constraints that prefer n violations to n+I violations up to any given n, there is no way in a finite-Rate system to express the general idea that fewer violations is better than more violations. As Frank and Satta \[7\] point out, finite-state constraints cannot make infinitely many distinctions of well-formedness. It is not likely that this limitation is a serious obstacle to practical optimality computations with finite-state systems as the number of constraint violations that need to be taken into account is generally small.</Paragraph>
    <Paragraph position="9"> It is curious that violation counting should emerge as the crucial issue that potentially pushes optimality theory out of the finite-state domain thus making it formally more powerful than rewrite systems and two-level models. It has never been presented as an argument against the older models that they do not allow unlimited counting. It is not clear whether the additional power constitutes an asset or an embarrassment for OT.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML