File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/p97-1040_metho.xml
Size: 37,383 bytes
Last Modified: 2025-10-06 14:14:36
<?xml version="1.0" standalone="yes"?> <Paper uid="P97-1040"> <Title>Efficient Generation in Primitive Optimality Theory</Title> <Section position="2" start_page="0" end_page="313" type="metho"> <SectionTitle> 1 Why formalize OT? </SectionTitle> <Paragraph position="0"> Phonology has recently undergone a paradigm shift.</Paragraph> <Paragraph position="1"> Since the seminal work of (Prince & Smolensky, 1993), phonologists have published literally hundreds of analyses in the new constraint-based framework of Optimality Th.eory, or OT. Old-style derivational analyses have all but vanished from the linguistics conferences.</Paragraph> <Paragraph position="2"> The price of this creative ferment has been a certain lack of rigor. The claim for O.T as Universal Grammar is not substantive or falsifiable without formal definitions of the putative Universal Grammar objects Repns, Con, and Gen (see below).</Paragraph> <Paragraph position="3"> Formalizing OT is necessary not only to flesh it out as a linguistic theory, but also for the sake of computational phonology. Without knowing what classes of constraints may appear in grammars, we can say only so much about the properties of the system, or about algorithms for generation, comprehension, and learning.</Paragraph> <Paragraph position="4"> The central claim of OT is that the phonology of any language can be naturally described as successive filtering. In OT, a phonological grammar for a language consists of ~ vector C1, C2, * .. C, of soft constraints drawn from a universal fixed set Con.</Paragraph> <Paragraph position="5"> Each constraint in the vector is a function that scores possible output representations (surface forms): (1) Ci : Repns --* {0, 1, 2,...} (Ci E Con) If C~(R) = 0, the output representation R is said to satisfy the ith constraint of the language. Otherwise it is said to violate that constraint, where the value of C~(R) specifies the degree of violation. Each constraint yields a filter that permits only minimal violation of the constraint: (2) Filteri(Set)= {R E Set : Ci(R) is minimal} Given an underlying phonological input, its set of legal surface forms under the grammar--typically of size 1--is just (3) Filter, (...Filter,. (Filter 1 (Gen(input)))) where the function Gen is fixed across languages and Gen(input) C_ Repns is a potentially infinite set of candidate surface forms.</Paragraph> <Paragraph position="6"> In practice, each surface form in Gen(input) must contain a silent copy of input, so the constraints can score it on how closely its pronounced material matches input. The constraints also score other criteria, such as how easy the material is to pronounce. If C1 in a given language is violated by just the forms with coda consonants, then Filterl(Gen(input)) includes only coda-free candidates--regardless of their other demerits, such as discrepancies from input or unusual syllable structure. The remaining constraints are satisfied only as well as they can be given this set of survivors. Thus, when it is impossible to satisfy all constraints at once, successive filtering means early constraints take priority.</Paragraph> <Paragraph position="7"> Questions under the new paradigm include these: * Generation. How to implement the input-output mapping in (3)? A brute-force approach fails to terminate if Gen produces infinitely many candidates. Speakers must solve this problem. So must linguists, if they are to know what their proposed grammars predict.</Paragraph> <Paragraph position="8"> * Comprehension. How to invert the input-output mapping in (3)? Hearers must solve this. * Learn,ng. How to induce a lexicon and a phonology like (1) for a particular language.</Paragraph> <Paragraph position="9"> given the kind of evidence available to child language learners? None of these questions is well-posed without restrictions on Gen and Con.</Paragraph> <Paragraph position="10"> In the absence of such restrictions, computational linguists have assumed convenient ones. gllison (1994) solves the generation problem where Gen produces a regular set of strings and Con admits all finite state transducers that can map a string to a number in unary notation. (Thus Ci(R) = 4 if the Ci transducer outputs the string llll on input R.) Tesar (1995. 1996) extends this result to the case where Gen(mput) is the set of parse trees for input under some context-free grammar (CFG)3 Tesar's constraints are functions on parse trees such tha~ Ci(\[A \[B1.. \] \[B~.-- .\]\]) can be computed from A, B:, B2, Ci(B1), and Ci(B~.). The optimal tree can then be found with a standard dynamic-programming chart parser for weighted CFGs.</Paragraph> <Paragraph position="11"> It is an important question whether these formalisms are useful in practice. On the one hand, are they expressive enough to describe real languages? On the other, are they restrictive enough to admit good comprehension and unsupervised-learning algorithms? null The present paper sketches primitive Optimality Theory (OTP)--a new formalization of OT that is explicitly proposed as a linguistic hypothesis. Representations are autosegmental, Gen is trivial, and only certain simple and phonologically local constraints are allowed. I then show the following: i. Good news: Generation in OTP can be solved attractively with finke-state methods. The solution is given in some detail.</Paragraph> <Paragraph position="12"> 2. Good news: OTP usefully restricts the space of grammars to be learned. (In particular. Generalized Alignment is outside the scope of finite-state or indeed context-free methods.} 3. Bad news: While OTP generation is close to linear on the size of the input form. it is NP-hard on the size of the grammar, which for human languages is likely to be quite large.</Paragraph> <Paragraph position="13"> 4. Good yews: Ellison's algorithm can be improved so that its exponential blowup is often avoided.</Paragraph> <Paragraph position="14"> *This extension is useful for OT syntax but may have little application to phonology, since the context-free case reduces to the regular case (i.e., Ellison) unless the CFG contains recursive productions.</Paragraph> </Section> <Section position="3" start_page="313" end_page="316" type="metho"> <SectionTitle> 2 Primitive Optimality Theory </SectionTitle> <Paragraph position="0"> Primitive Optimality Theory. or OTP. is a formalization of OT featuring a homogeneous output representation, extremely' local constraints, and a simple, unrestricted Gen. Linguistic arguments t'or OTP's constraints and representations are given in !Eisner.</Paragraph> <Paragraph position="1"> 1997). whereas the present description focuses ,an its formal properties and suitability for computational work. An axiomatic treatment is omitted for reasons of space. Despite its simplicity. OTP appears capable of capturing virtually all analyses found in the (phonological) OT literature.</Paragraph> <Section position="1" start_page="313" end_page="314" type="sub_section"> <SectionTitle> 2.1 Repns: Representations in OTP </SectionTitle> <Paragraph position="0"> To represent imP\], OTP uses not the autosegmentai representation in (4a) IGoldsmith. 1976: Goldsmith.</Paragraph> <Paragraph position="1"> 1990) but rather the simplified autosegmental representation in (4b), which has no association lines.</Paragraph> <Paragraph position="2"> Similarly (Sa) is replaced by (Sb). The central representational notion is that of a constituent timeline: an infinitely divisible line along on which constituents are laid out. Every constituent has width and edges.</Paragraph> <Paragraph position="3"> (4) a. voi b. ,~o,\[ * J t,o~ haS/ n~Jt \]ha, 1/ c! cIc \]c ! C C lab\[ jlab lab For phonetic interpretation: \]~o, says to end voicing (laryngeal vibration). At the same instant, \],,~, says to end nasality (raise velum}. (5) a.</Paragraph> <Paragraph position="5"> .k timeline can carry tl~e full panoply of phonological and morphological ,:onstituents--an.vthing that phonological constraints might have to refer to.</Paragraph> <Paragraph position="6"> Thus, a timetine bears not only autosegmental fe,.'> tures like nasal gestures inasi and prosodic ,:onstituents such as syllables \[o'\]. but also stress marks \[x\], feature dpmains such as \[ATRdom\] (Cole L: Kisseberth, 1994) and morphemes such as \[Stem i.</Paragraph> <Paragraph position="7"> All these constituents are formally identicah each marks off an interval on the timeline. Let Tiers denote the fixed finite set of constituent types. {has.</Paragraph> <Paragraph position="8"> ~. x, ATRdom. S*.em .... }.</Paragraph> <Paragraph position="9"> It is always possible to recover the old representation (4a) from the new one (4b), under the convention that two constituents on the timeline are linked if their interiors overlap (Bird & Ellison, 1994). The interior of a constituent is the open interval that excludes its edges: Thus, lab is linked to both consonants C in (4b), but the two consonants are not linked to each other, because their interiors do not overlap.</Paragraph> <Paragraph position="10"> By eliminating explicit association lines, OTP eliminates the need for faithfulness constraints on them, or for well-formedness constraints against gapping or crossing of associations. In addition, OTP can refer naturally to the edges of syllables (or morphemes). Such edges are tricky to define in (5a), because a syllable's features are scattered across multiple tiers and perhaps shared with adjacent syllables. In diagrams of timelines, such as (4b) and (5b), the intent is that only horizontal order matters.</Paragraph> <Paragraph position="11"> Horizontal spacing and vertical order are irrelevant.</Paragraph> <Paragraph position="12"> Thus, a timeline may be represented as a finite collection S of labeled edge brackets, equipped with ordering relations -~ and &quot; that indicate which brackets precede each other or fall in the same place.</Paragraph> <Paragraph position="13"> Valid timelines (those in Repns) also require that edge brackets come in matching pairs, that constituents have positive width, and that constituents of the same type do not overlap (i.e., two constituents on the same tier may not be linked).</Paragraph> </Section> <Section position="2" start_page="314" end_page="314" type="sub_section"> <SectionTitle> 2.2 Gem Input and output in OTP </SectionTitle> <Paragraph position="0"> OT's principle of Containment (Prince & Smolensky, 1993) says that each of the potential outputs in Repns includes a silent copy of the input, so that constraints evaluating it can consider the goodness of match between input and output. Accordingly, OTP represents both input and output constituents on the constituent timeline, but on different tiers.</Paragraph> <Paragraph position="1"> Thus surface nasal autosegments are bracketed with ,~as\[ and \],,a~, while underlying nasal autosegments are bracketed with ,as\[ and \] .... The underlining is a notational convention to denote input material.</Paragraph> <Paragraph position="2"> No connection is required between \[nas\] and \[nas! except as enforced by constraints that prefer \[nas\] and \[nas\] or their edges to overlap in some way. (6) shows a candidate in which underlying \[nas\] has surfaced &quot;in place&quot; but with rightward spreading.</Paragraph> <Paragraph position="4"> Here the left edges and interiors overlap, but the right edges fail to. Such overlap of interiors may be regarded as featural Input-Output Correspondence in the sense of (McCarthy & Prince, 1995).</Paragraph> <Paragraph position="5"> The lexicon and morphology supply to Gen an underspecified timeline--a partially ordered collection of input edges. The use of a partial ordering allows the lexicon and morphology to supply floating tones, floating morphemes and templatic morphemes. null Given such an underspecified timeline as lexical input, Gen outputs the set of all fully specified timelines that are consistent with it. No new input constituents may be added. In essence, Gen generates every way of refining the partial order of input constituents into a total order and decorating it freely with output constituents. Conditions such as the prosodic hierarchy (Selkirk, 1980) are enforced by universally high-ranked constraints, not by Gen. -~</Paragraph> </Section> <Section position="3" start_page="314" end_page="316" type="sub_section"> <SectionTitle> 2.3 Con: The primitive constraints </SectionTitle> <Paragraph position="0"> Having described the representations used, it is now possible to describe the constraints that evaluate them. OTP claims that Con is restricted to the following two families of primitive constraints: (7) a --* /3 (&quot;implication&quot;): &quot;Each ~ temporally overlaps some ~.&quot; Scoring: Constraint(R) = number of a's in R that do not overlap any 8.</Paragraph> <Paragraph position="1"> (8) a 3- /3 (&quot;clash&quot;): &quot;Each cr temporally overlaps no/3.&quot; Scoring: Constraint(R) = number of (a, ';3) pairs in R such that the a overlaps the/3. That is, a --~ /3 says that a's attract /3's, while a 3_ /3 says that c~'s repel/3's. These are simple and arguably natural constraints; no others are used. In each primitive constraint, cr and /3 each specify a phonological event. An event is defined to be either a type of labeled edge, written e.g. ~\[, or the interior (excluding edges) of a type of labeled constituent, written e.g.a. To express some constraints that appear in real phonologies, it is also necessary to allow, a and /3 to be non-empty conjunctions and disjunctions of events. However, it appears possible to limit these cases to the forms in (9)-(10). Note that other forms, such as those in (11), can be decomposed into a sequence of two or ~The formalism is complicated slightly by the possibility of deleting segments (syncope) or inserting segments (epenthesis), as illustrated by the candidates be- null low.</Paragraph> <Paragraph position="2"> (i) Syncope (CVC ~ CC): the _V is crushed to zero width so the C's can be adjacent.</Paragraph> <Paragraph position="3"> c\[ Ic \]c ~\[ 1~_ \]~ vlv (ii) Epenthesis (CC ~ CVC): the C__'s are pushed apart.</Paragraph> <Paragraph position="4"> c\[ \]~ ~\[ \]~ ~_\[ \]~_ ~\[ \]~ In order to Mlow adjacency of the surface consonants in (i), as expected by assimilation processes (and encouraged by a high-ranked constraint), note that the underlying vowel must be allowed to have zero width--an option available to to input but not output constituents. The input representation must specify only v\[ &quot;< Iv, not v\[ ~ \]v. Similarly, to allow (ii), the input representation must specify only \]c, __. c_~\[, not \]o, ~ c2\[. more constraints. 3 (9) ( c~1 and a~ and ... ) ---* (/31 or/32 or ...) Scoring: Constraint(R) = number of sets of events {A1, A2,...} of types (~l, a, .... respectively that all overlap on the timeline and whose intersection does not overlap any event of type/31,/3.,, * ...</Paragraph> <Paragraph position="5"> (10) (al anda2 and ...) .L (/31 and/3~ and ...) Scoring: Constraint(R) = number of sets of events {A1,A~.,..., B1,B~ .... } of types oq,a~ .... ,/31,/32,... respectively that all overlap on the timeline.</Paragraph> <Paragraph position="7"> The unifying theme is that each primitive constraint counts the number of times a candidate gets into some bad local configuration. This is an interval on the timeline throughout which certain events (one or more specified edges or interiors) are all present and certain other events (zero or more specified edges or interiors) are all absent.</Paragraph> <Paragraph position="8"> Several examples of phonologically plausible constraints, with monikers and descriptions, are given below. (Eisner, 1997) shows how to rewrite hundreds of constraints from the literature in the primitive constraint notation, and discusses the problematic case of reduplication. (Eisner, in press) gives a detailed stress typology using only primitive constraints; in particular, non-local constraints such as FTBIN, FOOTFORM, and Generalized Alignment (McCarthy & Prince, 1993) are eliminated.</Paragraph> <Paragraph position="9"> (12) a. ONSET: a\[- C\[ &quot;Every syllable starts with a consonant.&quot; b. NONFINALITY: \]Wo,-d _1_ \]F &quot;The end of a word may not be footed.&quot; c o\[ , l&quot;eet start and end on syllable boundaries.&quot; d. PACKFEET: \]F &quot;&quot;+ F\[ &quot;Each foot is followed immediately by another foot; i.e., minimize the number of gaps between feet. Note that the final foot, if any, will always violate this constraint.&quot; e, NOCLASH: \]X A_ x\[ &quot;Two stress marks may not be adjacent.&quot; f. PROGRESSIVEVOICING: \]voi _1_ C\[ &quot;If the segment preceding a consonant is voiced, voicing may not stop prior to the 3Such a sequence does alter the meaning slightly. To get the exact original meaning, we would have to decompose into so-cMled &quot;unranked&quot; constraints, whereby Ci (R) is defined as C,, (R)+Ci~ (R). But such ties undermine OT's idea of strict ranking: they confer the power to minimize linear functions such as (C1 + C1 + C1 +</Paragraph> <Paragraph position="11"> reason, OTP currently disallows unranked constraints; I know of no linguistic data that crucially require them.</Paragraph> <Paragraph position="12"> consonant but must be spread onto it.&quot; g, NASVOI: nas -- voi &quot;Every nasal gesture must be at least partly voiced.&quot; h. FULLNASVOI: has _\[_ voi\[, has I \]voi &quot;A nasal gesture may not be only partly voiced.&quot; i. MAX(VOi) or PARSE(voi): vo._i ~ voi &quot;Underlying voicing features surface.&quot; j. DEP(voi) or FILL(voi): voi ---, voi &quot;Voicing features appear on the surface only if they are a/so underlying.&quot; k. NoSPREADRIGHT(voi): voi _1_ \]vo__i_ &quot;Underlying voicing may not spread rightward as in (6).&quot; h NONDEGENERATE: F --~\[ &quot;Every foot must cross at least one morn boundary ,\[.&quot; m. TAUTOMORPHEMICFOOT: F _\]_ .~Iorph\[ &quot;No foot may cross a morpheme boundary.&quot; 3 Finite-state generation in OTP 3.1 A simple generation algorithm Recall that the generation problem is to find the output set S,~, where (13) a. So = Gen(inpu~) C_ Repns b. Si+l = Filteri+l(Si) C Si Since in OTP, the input is a partial order of edge brackets, and Sn is a set of one or more total orders (timelines), a natural approach is to successively refine a partial order. This has merit. However, not every Si can be represented as a single partial order, so the approach is quickly complicated by the need to encode disjunction.</Paragraph> <Paragraph position="13"> A simpler approach is to represent Si (as well as inpu~ and Repns) as a finite-state automaton (FSA), denoting a regular set of strings that encode timelines. The idea is essentially due to (Ellison, 1994), and can be boiled down to two lines:</Paragraph> </Section> <Section position="4" start_page="316" end_page="316" type="sub_section"> <SectionTitle> 3.2 OTP with automata </SectionTitle> <Paragraph position="0"> We may encode each timeline as a string over an enormous alphabet E. If \[Tiersl = k, then each symbol in E is a k-tuple, whose components describe what is happening on the various tiers at a given moment. The components are drawn from a smaller alphabet A = { \[, \], l, +, -}. Thus at any time, the ith tier may be beginning or ending a constituent ( \[, \] ) or both at once ( I ), or it may be in a steady state in the interior or exterior of a constituent (+, -).</Paragraph> <Paragraph position="1"> At a minimum, the string must record all moments where there is an edge on some tier. If all tiers are in a steady state, the string need not use any symbols to say so. Thus the string encoding is not unique.</Paragraph> <Paragraph position="2"> (15) gives an expression for all strings that correctly describe the single tier shown. (16) describes a two-tier timeline consistent with (15). Note that the brackets on the two tiers are ordered with respect to each other. Timelines like these could be assembled morphologically from one or more lexical entries (Bird & Ellison, 1994), or produced in the course of algorithm (14).</Paragraph> <Paragraph position="4"> We store timeline expressions like (16) as deterministic FSAs. To reduce the size of these automata, it is convenient to label arcs not with individual elements of El (which is huge) but with subsets of E, denoted by predicates. We use conjunctive predicates where each conjunct lists the allowed symbols on a given tier: (17) +F, 3cr, \[l+-voi (arc label w/ 3 conjuncts) The arc label in (17) is said to mention the tiers F, o', voi E Tiers. Such a predicate allows any symbol from A on the tiers it does not mention.</Paragraph> <Paragraph position="5"> The input FSA constrains only the input tiers. In (14) we intersect it with Repns, which constrains only the output tiers. Repns is defined as the intersection of many automata exactly like (18), called tier rules, which ensure that brackets are properly paired on a given tier such as F (foot).</Paragraph> </Section> </Section> <Section position="4" start_page="316" end_page="319" type="metho"> <SectionTitle> (18) -F ,+F </SectionTitle> <Paragraph position="0"> Like the tier rules, the constraint automata Ci are small and deterministic and can be built automatically. Every edge has weight O or 1. With some care it is possible to draw each Ci with two or fewer states, and with a number of arcs proportional to the number of tiers mentioned by the constraint.</Paragraph> <Paragraph position="1"> Keeping the constraints small is important for efficiency, since real languages have many constraints that must be intersected.</Paragraph> <Paragraph position="2"> Let us do the hardest case first. An implication constraint has the general form (9). Suppose that all the c~i are interiors, not edges. Then the constraint targets intervals of the form a = c~1 f'l c~2 fq * * .. Each time such an interval ends without any 3j having occurred during it, one violation is counted: (19) Weight-1 arcs are shown in bold; others are weight-0. (other) (other) b during a ~/~ &quot;-1/ II a ends A candidate that does see a #j during an c~ can go and rest in the right-hand state for the duration of the a.</Paragraph> <Paragraph position="3"> Let us fill in the details of (19). How do we detect the end of an a? Because one or more of the ai end (\], I), while all the al either end or continue (+), so that we know we are leaving an a. 5 Thus: (20) (in all ai)- (some bj) in all ai An unusually complex example is shown in (21). Note that to preserve the form of the predicates in (17) and keep the automaton deterministic, we need to split some of the arcs above into multiple arcs. Each flj gets its own arc, and we must also expand set differences into multiple arcs, using the scheme W - z A y A z = W V ~(x A y A z) = (W A ~x) V (W A z A-~y) V (W A x A y A -~:).</Paragraph> <Paragraph position="4"> sit is important to take \], not +, as our indication that we have been inside * constituent. This means that the timeline ( \[, -)(+, -)*(+, \[)(% +)*('\], +)(-, +)*(-, \]) cannot avoid violating a clash constraint simply by instantiating the (+, +)* part as e. Furthermore, the \] convention means that a zero-width input constituent (more precisely, a sequence of zero-width constituents, represented as a single 1 symbol) will often act as if it has an interior. Thus if V syncopates as in footnote 2, it still violates the parse constraint _V -- V. This is an explicit property of OTP: otherwise, nothing that failed to parse would ever violate PARSE, because it would be gone! On the other hand, &quot;l does not have this special role on the right hand side of ---+ , which does not quantify universally over an interval. The consequence for zero-width consituents is that even if a zero-width 1/_&quot; overlaps (at the edge, say) with a surface V, the latter cannot claim on this basis alone to satisfy FILL: V -- V__. This too seems like the right move linguistically, although further study is needed.</Paragraph> <Paragraph position="6"> How about other cases? If the antecedent of an implication is not. an interval, then the constraint needs only one state, to penalize moments when the antecedent holds and the consequent does not. Finally, a clash constraint cq I a2 _1_ ... is identical to the implication constraint (or1 and a.~ and...) --* FALSE. Clash FSAs are therefore just degenerate versions of implication FSAs, where the arcs looking for/3j do not exist because they would accept no symbol. (22) shows the constraints (p and \]q ) --+ b and p 3_ q.</Paragraph> <Section position="1" start_page="317" end_page="317" type="sub_section"> <SectionTitle> 4.1 Generalized Alignment is not flnite-state </SectionTitle> <Paragraph position="0"> Ellison's method can succeed only on a restricted formalism such as OTP, which does not admit such constraints as the popular Generalized Alignment (GA) family of (McCarthy & Prince, 1993). A typical GA constraint is ALIGN(F, L, Word, L), which sums the number of syllables between each left foot edge F\[ and the left edge of the prosodic word. Minimizing this sum achieves a kind of left-to-right iterative footing. OTP argues that such non-local, arithmetic constraints can generally be eliminated in favor of simpler mechanisms (Eisner, in press).</Paragraph> <Paragraph position="1"> Ellison's method cannot directly express the above GA constraint, even outside OTP, because it cannot compute a quadratic function 0 + 2 + 4 + -.. on a string like \[~cr\]F \[~a\]r \[~\]r '&quot; '. Path weights in an FSA cannot be more than linear on string length.</Paragraph> <Paragraph position="2"> Perhaps the filtering operation of any GA constraint can be simulated with a system of finite-state constraints? No: GA is simply too powerful.</Paragraph> <Paragraph position="3"> The proof is suppressed here for reasons of space, but it relies on a form of the pumping lemma for weighted FSAs. The key insight is that among candidates with a fixed number of syllables and a single (floating) tone, ALIGN(a, L, H, L) prefers candidates where the tone docks at the center. A similar argument for weighted CFGs (using two tones) shows this constraint to be too hard even for (Tesar, 1996).</Paragraph> </Section> <Section position="2" start_page="317" end_page="318" type="sub_section"> <SectionTitle> 4.2 Generation is NP-complete even in OTP </SectionTitle> <Paragraph position="0"> When algorithm (14) is implemented literally and with moderate care, using an optimizing C compiler on a 167MHz UltraSPARC, it takes fully 3.5 minutes (real time) to discover a stress pattern for the syllable sequence ~.6 The automata become impractically huge due to intersections.</Paragraph> <Paragraph position="1"> Much of the explosion in this case is introduced at the start and can be avoided. Because Repns has 21Tiersl = 512 states, So, $1, and $2 each have about 5000 states and 500,000 to 775,000 arcs.</Paragraph> <Paragraph position="2"> Thereafter the S~ automata become smaller, thanks to the pruning performed at each step by BestPaths.</Paragraph> <Paragraph position="3"> This repeated pruning is already an improvement over Ellison's original algorithm (which saves pruning till the end, and so continues to grow exponentially with every new constraint). If we modify (14) further, so that each tier rule from Repns is intersected with the candidate set only when its tier is first mentioned by a constraint, then the automata are pruned back as quickly as they grow. They have about 10 times fewer states and 100 times fewer arcs.</Paragraph> <Paragraph position="4"> and the generation time drops to 2.2 seconds.</Paragraph> <Paragraph position="5"> This is a key practical trick. But neither it nor any other trick can help for all grammars, for in the worst case, the OTP generation problem is NP-hard on the number of tiers used by the grammar. The locality of constraints does not save us here. Many NP-complete problems, such as graph coloring or bin packing, attempt to minimize some global count subject to numerous local restrictions. In the case of OTP generation, the global count to minimize is the degree of violation of Ci, and the local restrictions are imposed by C1, C2,... Ci-1.</Paragraph> <Paragraph position="6"> Proof of NP-hardness (by polytime reduction from Hamilton Path). Given G = (V(G), E(G)), an n-vertex directed graph. Put Tiers = V(G)tO {Stem, S}. Consider the following vector of O(n -~) primitive constraints (ordered as shown): (23) a. VveV(a): ~\[-~s\[ b. Vv E V(G): \]~ -- \]s c. Vv e V(G): St-era -~ v d. Stem .1_ S e. Vu, ve V(G) s.t. uv ~ E(G): \]u .L o\[ f. Is - null SThe grammar is taken from the OTP stress typology proposed by (Eisner, in press). It has tier rules for 9 tiers, and then spends 26 constraints on obvious universal properties of morns and syllables, followed by 6 constraints for universal properties of feet and stress marks and finally 6 substantive constraints that can be freely reranked to yield different stress systems, such as left-to-right iambs with iambic lengthening.</Paragraph> <Paragraph position="7"> Suppose the input is simply \[Stem\]. Filtering Gen(input) through constraints (23a-d), we are left with just those candidates where Stem bears n (disjoint) constituents of type S, each coextensive with a constituent bearing a different label v E V(G). (These candidates satisfy (23a-c) but violate (23d) n times.) (23e) says that a chain of abutting constituents \[uIvIw\]. * * is allowed only if it corresponds to a path in G. Finally, (23f) forces the grammar to minimize the number of such chains. If the minimum is 1 (i.e., an arbitrarily selected output candidate violates (23f) only once), then G has a Hamilton path. When confronted with this pathological case, the finite:state methods respond essentially by enumerating all possible permutations of V(G) (though with sharing of prefixes). The machine state stores, among other things, the subset of V(G) that has already been seen; so there are at least 2 ITiersl states. It must be emphasized that if the grammar is fixed in advance, algorithm (14) is close to linear in the size of the input form: it is dominated by a constant number of calls to Dijkstra's BestPaths method, each taking time O(\[input arcs\[ log \[input statesl). There are nonetheless three reasons why the above result is important. (a) It raises the practical specter of huge constant factors (> 2 4deg) for real grammars. Even if a fixed grammar can somehow be compiled into a fast form for use with many inputs, the compilation itself will have to deal with this constant factor. (b) The result has the interesting implication that candidate sets can arise that cannot be concisely represented with FSAs. For if all Si were polynomial-sized in (14), the algorithm would run in polynomial time. (c) Finally, the grammar is not fixed in all circumstances: both linguists and children crucially experiment with different theories.</Paragraph> </Section> <Section position="3" start_page="318" end_page="319" type="sub_section"> <SectionTitle> 4.3 Work in progress: Factored automata </SectionTitle> <Paragraph position="0"> The previous section gave a useful trick for speeding up Ellison's algorithm in the typical case. We are currently experimenting with additional improvements along the same lines, which attempt to defer intersection by keeping tiers separate as long as possible.</Paragraph> <Paragraph position="1"> The idea is to represent the candidate set S/not as a large unweighted FSA, but rather as a collection A of preferably small unweighted FSAs, called factors, each of which mentions as few tiers as possible. This collection, called a factored automaton, serves as a compact representation of hA. It usually has far fewer states than 71.,4 would if the intersection were carried out.</Paragraph> <Paragraph position="2"> For instance, the natural factors of So are input and all the tier rules (see 18). This requires only O(\[Tiers\[ + \[input\[) states, not O(21Tiersl. \[input\[). Using factored automata helps Ellison's algorithm (14) in several ways: * The candidate sets Si tend to be represented more compactly.</Paragraph> <Paragraph position="3"> * In (14), the constraint Ci+l needs to be intersected with only certain factors of Si.</Paragraph> <Paragraph position="4"> * Sometimes Ci+l does not need to be intersected with the input, because they do not mention any of the same tiers. Then step i + 1 can be performed in time independent of input length.</Paragraph> <Paragraph position="5"> Example: input = , which is a 43-state automaton, and C1 is F -- x, which says that every foot bears a stress mark. Then to find</Paragraph> <Paragraph position="7"> S0's tier rules for F and x, which require well-formed feet and well-formed stress marks, and combine them with C1 to get a new factor that requires stressed feet. No other factors need be involved.</Paragraph> <Paragraph position="8"> The key operation in (14) is to find Bestpaths(A 71 C), where .4 is an unweighted factored automaton and C is an ordinary weighted FSA (a constraint).</Paragraph> <Paragraph position="9"> This is the best intersection problem. For concreteness let us suppose that C encodes F ---* x, a two-state constraint.</Paragraph> <Paragraph position="10"> A naive idea is simply to add F ---* x to ..4 as a new factor. However, this ignores the BestPaths step: we wish to keep just the best paths in r\[ ~ x\[ that are compatible with A. Such paths might be long and include cycles in F\[ ---* x\[. For example, a weight-1 path would describe a chain of optimal stressed feet interrupted by a single unstressed one where A happens to block stress.</Paragraph> <Paragraph position="11"> A corrected variant is to put I -- 71.A and run BestPaths on I 71 C. Let the pruned result be B.</Paragraph> <Paragraph position="12"> We could add B directly back to to ,4 as a new factor, but it is large. We would rather add a smaller factor B' that has the same effect, in that 1 71 B' = 1 71 B. (B' will look something like the original C, but with some paths missing, some states split, and some cycles unrolled.) Observe that each state of B has the form i x c for some i E I and c E C. We form B' from B by &quot;re-merging&quot; states i x c and i' x c where possible, using an approach similar to DFA minimization.</Paragraph> <Paragraph position="13"> Of course, this variant is not very efficient, because it requires us to find and use I = N.4. What we really want is to follow the above idea but use a smaller I, one that considers just the relevant factors in .,4. We need not consider factors that will not affect the choice of paths in C above.</Paragraph> <Paragraph position="14"> Various approaches are possible for choosing such an I. The following technique is completely general, though it may or may not be practical.</Paragraph> <Paragraph position="15"> Observe that for BestPaths to do the correct thing, I needs to reflect the sum total of .A's constraints on F and x, the tiers that C mentions. More formally, we want I to be the projection of the candidate set N.A onto just the F and x tiers. Unfortunately, these constraints are not just reflected in the factors mentioning F or x, since the allowed configurations of F and x may be mediated through additional factors. As an example, there may be a factor mentioning F and C/, some of whose paths are incompatible with the input factor, because the latter allows C/ only in certain places or because only allows paths of length 14.</Paragraph> <Paragraph position="16"> 1. Number the tiers such that F and x are numbered 0, and all other tiers have distinct positive numbers.</Paragraph> <Paragraph position="17"> 2. Partition the factors of .4 into lists L0, L1, L2,... Lk, according to the highest-numbered tier they mention. (Any factor that mentions no tiers at all goes onto L0.) 3. If k -- 0, then return MLk as our desired I.</Paragraph> <Paragraph position="18"> 4. Otherwise, MLk exhausts tier k's ability to me null diate relations among the factors. Modify the arc labels of ML} so that they no longer restrict (mention) k. Then add a determinized, minimized version of the result to to Lj, where j is the highest-numbered tier it now mentions.</Paragraph> <Paragraph position="19"> 5. Decrement k and return to step 3.</Paragraph> <Paragraph position="20"> If n.4 has k factors, this technique must perform k - 1 intersections, just as if we had put I = n.4. However, it intersperses the intersections with determinization and minimization operations, so that the automata being intersected tend not to be large. In the best case, we will have k-</Paragraph> </Section> </Section> <Section position="5" start_page="319" end_page="319" type="metho"> <SectionTitle> 1 intersection-determinization-minimizations that </SectionTitle> <Paragraph position="0"> cost O(1) apiece, rather than k-1 intersections that cost up to 0(2 k) apiece.</Paragraph> </Section> class="xml-element"></Paper>