File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/n03-1009_metho.xml

Size: 32,990 bytes

Last Modified: 2025-10-06 14:08:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-1009">
  <Title>Simpler and More General Minimization for Weighted Finite-State Automata</Title>
  <Section position="4" start_page="5" end_page="5" type="metho">
    <SectionTitle>
3 Pushing and Its Limitations
</SectionTitle>
    <Paragraph position="0"> The intuition behind pushing is to canonicalize states' suffix functions. This increases the chance that two states will have the same suffix function. In the example of the previous section, we were able to replace F3 with ww\F3 (pushing the ww backwards onto state 3's incoming arc), making it equal to F1 so {1,3} could merge.</Paragraph>
    <Paragraph position="1"> Since canonicalization was also performed at states 2 and 4, F1 and F3 ended up with identical representations: arc weights were distributed identically along corresponding paths from 1 and 3. Hence unweighted minimization could discover that F1 = F3 and merge {1,3}.</Paragraph>
    <Paragraph position="2"> Mohri's pushing strategy--we will see others--is always to extract some sort of &amp;quot;maximum left factor&amp;quot; from each suffix function Fq and push it backwards. That is, he expresses Fq = k [?] G for as &amp;quot;large&amp;quot; a k [?] K as possible--a maximal common prefix--then pushes factor k back out of the suffix function so that it is counted earlier on paths through q (i.e., before reaching q). q's suffix function now has canonical form G (i.e., k\Fq).</Paragraph>
    <Paragraph position="3"> How does Mohri's strategy reduce to practice? For transducers, where (K,[?]) = ([?][?],concat), the maximum left factor of Fq is the longest common prefix of the strings in range(Fq).3 Thus we had range(F3) = {wwyz,wwzzz} above with longest common prefix ww.</Paragraph>
    <Paragraph position="4"> For the tropical semiring (R[?]0 [?] {[?]},min,+), where k\m = m [?] k is defined only if k [?] m, the maximum left factor k is the minimum of range(Fq).</Paragraph>
    <Paragraph position="5"> But &amp;quot;maximum left factor&amp;quot; is not an obvious notion for all semirings. If we extended the tropical semir- null range(Fq) excludes 0 (the weight of unaccepted strings). Left factors are unaffected, as anything can divide 0.</Paragraph>
    <Paragraph position="6"> ing with negative numbers, or substituted the semiring (R[?]0,+,x), keeping the usual definition of &amp;quot;maximum,&amp;quot; then any function would have arbitrarily large left factors. A more fundamentally problematic example is the semiring Z[[?][?]5]. It is defined as ({m+n[?][?]5 : m,n [?] Z},+,x) where Z denotes the integers. It is a standard example of a commutative algebra in which factorization is not unique. For example, 6 = 2 [?] 3 = (1 + [?][?]5) [?] (1 [?][?][?]5) and these 4 factors cannot be factored further. This makes it impossible to canonicalize  What is the best left factor to extract from F2? We could left-divide F2 by either 2 or 1+[?][?]5. The former action allows us to merge {1,2} and the latter to merge {2,3}; but we cannot have it both ways. So this automaton has no unique minimization! The minimum of 4 states is achieved by two distinct answers (contrast footnote 2).</Paragraph>
    <Paragraph position="7"> It follows that known minimization techniques will not work in general semirings, as they assume state mergeability to be transitive.4 In general the result of minimization is not even well-defined (i.e., unique).</Paragraph>
    <Paragraph position="8"> Of course, given a deterministic automaton M, one may still seek an equivalent -M with as few states as possible. But we will now see that even finding the minimum number of states is NP-complete, and inapproximable.</Paragraph>
    <Paragraph position="9"> The NP-hardness proof [which may be skipped on a first reading] is by reduction from Minimum Clique Partition. Given a graph with vertex set V = {1,2,...n} and edge set E, we wish to partition V into as few cliques as possible. (S [?] V is a clique of the graph iff ij [?] E for all pairs i,j [?] S.) Determining the minimum number of cliques is NP-complete and inapproximable: that is, unless P=NP, we cannot even find it within a factor of 2 or 3 or any other constant factor in polynomial time.5 Given such a graph, we reduce the clique problem to our problem. Consider the &amp;quot;bitwise boolean&amp;quot; semiring ({0,1}n, OR, AND). Each weight k is a string of n bits, 4A further wrinkle lies in deciding what and how to push; in general semirings, it can be necessary to shift weights forward as well as backward along paths. Modify the example above by pushing a factor of 2 backwards through state 2. Making F2 = F3 in this modified example now requires pushing 2 forward and then 1 +[?][?]5 backward through state 2.</Paragraph>
    <Paragraph position="10"> 5This problem is just the dual of Graph Coloring. For detailed approximability results see (Crescenzi and Kann, 1998). denoted k1,...kn. For each i [?] V , define fi,ki,mi [?] K as follows: fij = 0 iff ij [?] E; kij = 1 iff i = j; mij = 0 iff either ij [?] E or i = j. Now consider the following automaton M over the alphabet S = {a,b,c1,...cn}.</Paragraph>
    <Paragraph position="11"> The states are {0,1,...n,n+1}; 0 is the initial state and n + 1 is the only final state. For each i [?] V , there is an arc 0 ci:1n[?][?]-i and arcs i a:ki[?][?]-(n + 1) and i b:mi[?][?]-(n + 1). A minimum-state automaton equivalent to M must have a topology obtained by merging some states of V .</Paragraph>
    <Paragraph position="12"> Other topologies that could accept the same language (c1|c2|***|cn)(a|b) are clearly not minimal (they can be improved by merging final states or by trimming).</Paragraph>
    <Paragraph position="13"> We claim that for S [?] {1,2,...n}, it is possible to merge all states in S into a single state (in the automaton) if and only if S is a clique (in the graph):  * If S is a clique, then define k,m [?] K by ki = 1 iff i [?] S, and mi = 1 iff i negationslash[?] S. Observe that for every i [?] S, we have ki = fi [?] k, mi = fi [?] m. So by pushing back a factor of fi at each i [?] S, one can make all i [?] S share a suffix function and then merge them.</Paragraph>
    <Paragraph position="14"> * If S is not a clique, then choose i,j [?] S so that ij negationslash[?] E. Considering only bit i, there exists no bit  pair (ki,mi) [?] {0,1}2 of which (kii,mii) = (1,0) and (kji,mji) = (0,1) are both left-multiples. So there can exist no weight pair (k,m) of which (ki,mi) and (kj,mj) are both left-multiples. It is therefore not possible to equalize the suffix functions Fi and Fj by leftdividing each of them.6 i and j cannot be merged.</Paragraph>
    <Paragraph position="15"> Thus, the partitions of V into cliques are identical to the partitions of V into sets of mergeable states, which are in 1-1 correspondence with the topologies of automata equivalent to M and derived from it by merging. There is an N-clique partition of V iff there is an (N+2)-state automaton. It follows that finding the minimum number of states is as hard, and as hard to approximate within a constant factor, as finding the minimum number of cliques.</Paragraph>
  </Section>
  <Section position="5" start_page="5" end_page="5" type="metho">
    <SectionTitle>
4 When Is Minimization Unique?
</SectionTitle>
    <Paragraph position="0"> The previous section demonstrated the existence of pathological weight semirings. We now partially characterize the &amp;quot;well-behaved&amp;quot; semirings (K,[?],[?]) in which all automata do have unique minimizations. Except when otherwise stated, lowercase variables are weights [?] K and uppercase ones are K-valued rational functions.</Paragraph>
    <Paragraph position="1"> [This section may be skipped, except the last paragraph.] A crucial necessary condition is that (K,[?]) allow what we will call greedy factorization, meaning that given f[?]F = g[?]G negationslash= 0, it is always possible to express 6This argument only shows that pushing backward cannot give them the same suffix function. But pushing forward cannot help either, despite footnote 4, since 1n on the arc to i has no right factors other than itself (the identity) to push forward.</Paragraph>
    <Paragraph position="2"> F = fprime [?]H and G = gprime [?]H. This condition holds for many practically useful semirings, commutative or otherwise. It says, roughly, that the order in which left factors are removed from a suffix function does not matter. We can reach the same canonical H regardless of whether we left-divide first by f or g.</Paragraph>
    <Paragraph position="3"> Given a counterexample to this condition, one can construct an automaton with no unique minimization. Simply follow the plan of the Z[[?][?]5] example, putting</Paragraph>
    <Paragraph position="5"> Some useful semirings do fail the condition. One is the &amp;quot;bitwise boolean&amp;quot; semiring that checks a string's membership in two languages at once: (K,[?],[?]) = ({00,01,10,11}, OR, AND). (Let F2 = 01 [?]</Paragraph>
    <Paragraph position="7"> pointwise x (which computes a string's probability under two models) fails similarly. So does (sets,[?],[?]) (which collects features found along the accepting path).</Paragraph>
    <Paragraph position="8"> We call H a residue of F iff F = fprime [?] H for some fprime. Write F similarequal G iff F, G have a common residue. In these terms, (K,[?]) allows greedy factorization iff F similarequal G when F, G are residues of the same nonzero function.</Paragraph>
    <Paragraph position="9"> More perspicuously, one can show that this holds iff similarequal is an equivalence relation on nonzero, K-valued functions.</Paragraph>
    <Paragraph position="10"> So in semirings where minimization is uniquely defined, similarequal is necessarily an equivalence relation. Given an automaton M for function F, we may regard similarequal as an equivalence relation on the states of a trimmed version of M:8 q similarequal r iff Fq similarequal Fr. Let [r] = {r1,...,rm} be the (finite) equivalence class of r: we can inductively find at least one function F[r] that is a common residue of Fr1,...,Frm. The idea behind minimization is to construct a machine -M whose states correspond to these equivalence classes, and where each [r] has suffix function F[r]. The Appendix shows that -M is then minimal.</Paragraph>
    <Paragraph position="11"> If M has an arc q a:k[?]-r, -M needs an arc [q]a:kprime[?]-[r], where kprime is such that a[?]1F[q] = kprime [?]F[r].</Paragraph>
    <Paragraph position="12"> The main difficulty in completing the construction of -M is to ensure each weight kprime exists. That is, F[r] must be carefully chosen to be a residue not only of Fr1,...,Frm (which ultimately does not matter, as long as F[0] is a residue of F0, where 0 is the start state) but also of a[?]1F[q]. If M is cyclic, this imposes cyclic dependencies on the choices of the various F[q] and F[r] functions. We have found no simple necessary and sufficient condition on (K,[?]) that guarantees a globally consistent set of choices to exist. However, we have given a useful nec- null essary condition (greedy factorization), and we now give a useful sufficient condition. Say that H is a minimum residue of G negationslash= 0 if it is a residue of every residue of G. (If G has several minimum residues, they are all residues of one another.) If (K,[?]) is such that every G has a minimum residue--a strictly stronger condition than greedy factorization--then it can be shown that G has the same minimum residues as any H similarequal G. In such a (K,[?]), -M can be constructed by choosing the suffix functions F[r] independently. Just let F[r] = F{r1,...,rm} be a minimum residue of Fr1. Now consider again M's arc q a:k[?]-r: since a[?]1F[q] similarequal a[?]1Fq similarequal Fr similarequal Fr1, we see F[r] is a (minimum) residue of a[?]1F[q], so that a weight kprime can be chosen for [q]a:kprime[?]-[r].</Paragraph>
    <Paragraph position="13"> A final step ensures that -M defines the function F. To describe it, we must augment the formalism to allow an initial weight i(0) [?] K, and a final weight ph(r) [?] K for each final state r. The weight of an accepting path from the start state 0 to a final state r is now defined to be i(0)[?](weights of arcs along the path)[?]ph(r). In -M, we set i([0]) to some k such that F0 = k [?]F[0], and set ph([r]) = F[r](e). The mathematical construction is done.</Paragraph>
  </Section>
  <Section position="6" start_page="5" end_page="5" type="metho">
    <SectionTitle>
5 A Simple Minimization Recipe
</SectionTitle>
    <Paragraph position="0"> We now give an effective algorithm for minimization in the semiring (K,[?]). The algorithmic recipe has one ingredient: along with (K,[?]), the user must give us a left-factor functional l that can choose a left factor l(F) of any function F. Formally, if S is the input alphabet, then we require l : (S[?] - K) - K to have the following properties for any rational F : S[?] - K and any k [?] K:  The algorithm generalizes Mohri's strategy as outlined in section 2. We just use l to pick the left factors during pushing. The l's used by Mohri for two semirings were mentioned in section 3. We will define another l in section 6. Naturally, it can be shown that no l can exist in a semiring that lacks greedy factorization, such as Z[[?][?]5]. The 3 properties above are needed for the strategy to work. The strategy also requires (K,[?]) to be left cancellative, i.e., k [?] m = k [?] mprime implies m = mprime (if k negationslash= 0). In other words, left quotients by k are unique when they exist (except for 0\0). This relieves us from having to make arbitrary choices of weight during pushing. Incompatible choices might prevent arc labels from matching as desired during the merging step of section 2.</Paragraph>
    <Paragraph position="1"> 9To show the final-quotient property given the other two, it suffices to show that l(G) [?] K has a right inverse in K, where G is the function mapping e to 1 and everything else to 0.</Paragraph>
    <Paragraph position="2"> Given an input DFA. At each state q, simultaneously, we will push back l(Fq). This pushing construction is trivial once the l(Fq) values are computed. An arc q a:k[?]-r should have its weight changed from k to</Paragraph>
    <Paragraph position="4"> defined (by the quotient property and left cancellativity)10 and can be computed as l(Fq)\(k[?]l(Fr)) (by the shifting property). Thus a subpath q a:k[?]-r b:lscript[?]-s, with weight k [?] lscript, will become qa:kprime[?]-r b:lscriptprime[?]-s, with weight kprime [?] lscriptprime = (l(Fq)\(k [?] l(Fr))) [?] (l(Fr)\(lscript [?] l(Fs))). In this way the factor l(Fr) is removed from the start of all paths from r, and is pushed backwards through r onto the end of all paths to r. It is possible for this factor (or part of it) to travel back through multiple arcs and around cycles, since kprime is found by removing a l(Fq) factor from all of k [?]l(Fr) and not merely from k.</Paragraph>
    <Paragraph position="5"> As it replaces the arc weights, pushing also replaces the initial weight i(0) with i(0) [?] l(F0), and replaces each final weight ph(r) with l(Fr)\ph(r) (which is welldefined, by the final-quotient property). Altogether, pushing leaves path weights unchanged (by easy induction).11 After pushing, we finish with merging and trimming as in section 2. While merging via unweighted DFA minimization treats arc weights as part of the input symbols, what should it do with any initial and final weights? The start state's initial weight should be preserved. The merging algorithm can and should be initialized with a multi-way partition of states by final weight, instead of just a 2-way partition into final vs. non-final.12 The Appendix shows that this strategy indeed finds the unique minimal automaton.</Paragraph>
    <Paragraph position="6"> It is worth clarifying how this section's effective algorithm implements the mathematical construction from the end of section 4. At each state q, pushing replaces the suffix function Fq with l(Fq)\Fq. The quotient properties of l are designed to guarantee that this quotient is defined,13 and the shifting property is designed to ensure 10Except in the case 0\0, which is not uniquely defined. This arises only if Fq = 0, i.e., q is a dead state that will be trimmed later, so any value will do for 0\0: arcs from q are irrelevant.</Paragraph>
    <Paragraph position="7"> 11One may prefer a formalism without initial or final weights.</Paragraph>
    <Paragraph position="8"> If the original automaton is free of final weights (other than 1), so is the pushed automaton--provided that l(F) = 1 whenever F(e) = 1, as is true for all l's in this paper. Initial weights can be eliminated at the cost of duplicating state 0 (details omitted). 12Alternatively, Mohri (2000, SS4.5) explains how to temporarily eliminate final weights before the merging step.</Paragraph>
    <Paragraph position="9"> 13That is, l(Fq)\Fq(g) exists for each g [?] S[?]. One may show by induction on |g |that the left quotients l(F)\F(g) exist for all F. When |g |= 0 this is the final-quotient property.</Paragraph>
    <Paragraph position="11"> (l(a[?]1F)\(a[?]1F)(gprime)), where the first factor exists by the quotient property and the second factor exists by inductive hypothesis. null that it is a minimum residue of Fq.14 In short, if the conditions of this section are satisfied, so are the conditions of section 4, and the construction is the same.</Paragraph>
    <Paragraph position="12"> The converse is true as well, at least for right cancellative semirings. If such a semiring satisfies the conditions of section 4 (every function has a minimum residue), then the requirements of this section can be met to obtain an effective algorithm: there exists a l satisfying our three properties,15 and the semiring is left cancellative.16</Paragraph>
  </Section>
  <Section position="7" start_page="5" end_page="5" type="metho">
    <SectionTitle>
6 Minimization in Division Semirings
</SectionTitle>
    <Paragraph position="0"> For the most important idea of this paper, we turn to a common special case. Suppose the semiring (K,[?],[?]) defines k\m for all m,k negationslash= 0 [?] K. Equivalently,17 suppose every k negationslash= 0 [?] K has a unique two-sided inverse k[?]1 [?] K. Useful cases of such division semirings include the real semiring (R,+,x), the tropical semiring extended with negative numbers (R[?]{[?]},min,+), and expectation semirings (Eisner, 2002). Minimization has not previously been available in these.</Paragraph>
    <Paragraph position="1"> We propose a new left-factor functional that is fast to compute and works in arbitrary division semirings. We avoid the temptation to define l(F) ascircleplustextrange(F): this definition has the right properties, but in some semirings including (R[?]0,+,x) the infinite summation is quite expensive to compute and may even diverge. Instead (unlike Mohri) we will permit our l(F) to depend on more than just range(F).</Paragraph>
    <Paragraph position="2"> Order the space of input strings S[?] by length, breaking ties lexicographically. For example, e &lt; bb &lt; aab &lt; aba &lt; abb. Now define 14Suppose X is any residue of Fq, i.e., we can write Fq = x [?] X. Then we can rewrite the identity Fq = l(Fq) [?] (l(Fq)\Fq), using the shifting property, as x [?] X = x [?] l(X)[?](l(Fq)\Fq). As we have separately required the semiring to be left cancellative, this implies that X = l(X) [?] (l(Fq)\Fq). So (l(Fq)\Fq) is a residue of any residue X of Fq, as claimed.</Paragraph>
    <Paragraph position="3"> 15Define l(0) = 0. From each equivalence class of nonzero functions under similarequal, pick a single minimum residue (axiom of choice). Given F, let [F] denote the minimum residue from its class. Observe that F = f[?][F] for some f; right cancellativity implies f is unique. So define l(F) = f. Shifting property: l(k [?]F) = l(k [?]f [?] [F]) = k [?]f = k [?]l(f [?] [F]) = k[?]l(F). Quotient property: l(a[?]1F)[?][a[?]1F] = a[?]1F =</Paragraph>
    <Paragraph position="5"> quotient property: Quotient exists since F(e) = l(F)[?][F](e).</Paragraph>
    <Paragraph position="6"> 16Let &lt;x,y&gt; denote the function mapping a to x, b to y, and everything else to 0. Given km = kmprime, we have k[?]&lt;m,1&gt; = k[?]&lt;mprime,1&gt; . Since the minimum residue property implies greedy factorization, we can write &lt;m,1&gt; = f [?] &lt;a,b&gt; , &lt;mprime,1&gt; = g [?] &lt;a,b&gt; . Then f [?] b = g [?] b, so by right cancellativity</Paragraph>
    <Paragraph position="8"> where support(F) denotes the set of input strings to which F assigns a non-0 weight. This l clearly has the shifting property needed by section 5. The quotient and final-quotient properties come for free because we are in a division semiring and because l(F) = 0 iff F = 0.</Paragraph>
    <Paragraph position="9"> Under this definition, what is l(Fq) for a suffix function Fq? Consider all paths of nonzero weight18 from state q to a final state. If none exist, l(Fq) = 0. Otherwise, minsupport(Fq) is the input string on the shortest such path, breaking ties lexicographically.19 l(Fq) is simply the weight of that shortest path.</Paragraph>
    <Paragraph position="10"> To push, we must compute l(Fq) for each state q. This is easy because l(Fq) is the weight of a single, minimum-length and hence acyclic path from q. (Previous methods combined the weights of all paths from q, even if infinitely many.) It also helps that the left factors at different states are related: if the minimum path from q begins with a weight-k arc to r, then it continues along the minimum path from r, so l(Fq) = k [?]l(Fr).</Paragraph>
    <Paragraph position="11"> Below is a trivial linear-time algorithm for computing l(Fq) at every q. Each state and arc is considered once in a breadth-first search back from the final states. len(q) and first(q) store the string length and first letter of a running minimum of support(Fq) [?] S[?].</Paragraph>
    <Paragraph position="12">  1. foreach state q 2. if q is final then 3. len(q) := 0 (* min support(Fq) is e for final q *) 4. l(Fq) := ph(q) (* Fq(e) is just the final weight, ph(q) *) 5. enqueue q on a FIFO queue 6. else 7. len(q) := [?] (* not yet discovered *) 8. l(Fq) := 0 (* assume Fq = 0 until we discover q *) 9. until the FIFO queue is empty 10. dequeue a state r 11. foreach arc q a:k[?]-r entering r such that k negationslash= 0 12. if len(q) = [?] then enqueue q (* breadth-first search *) 13. if len(q) = [?] or (len(q) = len(r) + 1 and a &lt; first(q)) then 14. first(q) := a (* reduce min support(Fq) *) 15. len(q) := len(r) + 1 16. l(Fq) := k[?]l(Fr)  The runtime is O(|states|+t*|arcs|) if [?] has runtime t. If [?] is slow, this can be reduced to O(t*|states|+|arcs|) by removing line 16 and waiting until the end, when the minimum path from each non-final state q is fully known, to compute the weight l(Fq) of that path. Simply finish up by calling FIND-l on each state q:  FIND-l(state q): 1. if l(Fq) = 0 and len(q) &lt; [?] then 2. l(Fq) := s(q,first(q))[?] FIND-l(d(q,first(q))) 3. return l(Fq)  icographic ordering, a[?]b [?] S[?] would have no min. After thus computing l(Fq), we simply proceed with pushing, merging, and trimming as in section 5.20 Pushing runs in time O(t*|arcs|) and trimming in O(|states|+ |arcs|). Merging is worse, with time O(|arcs|log|states|).</Paragraph>
  </Section>
  <Section position="8" start_page="5" end_page="5" type="metho">
    <SectionTitle>
7 A Bonus: Non-Division Semirings
</SectionTitle>
    <Paragraph position="0"> The trouble with Z[[?][?]5] was that it &amp;quot;lacked&amp;quot; needed quotients. The example on p. 3 can easily be minimized (down to 3 states) if we regard it instead as defined over (C,+,x)--letting us use any weights in C. Simply use section 6's algorithm.</Paragraph>
    <Paragraph position="1"> This new change-of-semiring trick can be used for other non-division semirings as well. One can extend the original weight semiring (K,[?],[?]) to a division semiring by adding [?]-inverses.21 In this way, the tropical semiring (R[?]0 [?] {[?]}, min,+) can be augmented with the negative reals to obtain (R [?] {[?]},min,+). And the transducer semiring ([?][?] [?]{[?]},min,concat)22 can be augmented by extending the alphabet [?] = {x,y,...} with inverse letters {x[?]1,y[?]1,...}.</Paragraph>
    <Paragraph position="2"> The minimized DFA we obtain may have &amp;quot;weird&amp;quot; arc weights drawn from the extended semiring. But the arc weights combine along paths to produce the original automaton's outputs, which fall in the original semiring. Let us apply this trick to the example of section 2, yielding the following pushed automaton in which F1 = F3 as desired. (x[?]1,y[?]1,... are written as X,Y,..., and l(Fq) is displayed at each q.)  For example, the z[?]1y[?]1zzz output on the 3-4 arc was computed as l(F3)[?]1 [?]wwzzz[?]l(F4) = (wwyz)[?]1 [?] wwzzz[?]e = z[?]1y[?]1w[?]1w[?]1wwzzz.</Paragraph>
    <Paragraph position="3"> This trick yields new algorithms for the tropical semiring and sequential transducers, which is interesting and perhaps worthwhile. How do they compare with previous work? Over the tropical semiring, our linear-time pushing algorithm is simpler than (Mohri, 1997), and faster by a 20It is also permissible to trim the input automaton at the start, or right after computing l (note that l(Fq) = 0 iff we should trim q). This simplifies pushing and merging. No trimming is then needed at the end, except to remove the one dead state that the merging step may have added to complete the automaton. 21This is often possible but not always; the semiring must be cancellative, and there are other conditions. Even disregarding [?] because we are minimizing a deterministic automaton, it is not simple to characterize when the monoid (K,[?]) can be embedded into a group (Clifford and Preston, 1967, chapter 12). 22Where min can be defined as in section 6 and footnote 1. log factor, because it does not require a priority queue. (Though this does not help the overall complexity of minimization, which is dominated by the merging step.) We also have no need to implement faster algorithms for special cases, as Mohri proposes, because our basic algorithm is already linear. Finally, our algorithm generalizes better, as it can handle negative weight cycles in the input. These are useful in (e.g.) conditional random fields.</Paragraph>
    <Paragraph position="4"> On the other hand, Mohri's algorithm guarantees a potentially useful property that we do not: that the weight of the prefix path reading a [?] S[?] is the minimum weight of all paths with prefix a. Commonly this approximates [?]log(p(most probable string with prefix a)), perhaps a useful value to look up for pruning.</Paragraph>
    <Paragraph position="5"> As for transducers, how does our minimization algorithm (above) compare with previous ones? Following earlier work by Choffrut and others, Mohri (2000) defines l(Fq) as the longest common prefix of range(Fq).</Paragraph>
    <Paragraph position="6"> He constrains these values with a set of simultaneous equations, and solves them by repeated changes of variable using a complex relaxation algorithm. His implementation uses various techniques (including a trie and a graph decomposition) to make pushing run in time O(|states |+ |arcs |* maxq |l(Fq)|).23 Breslauer (1996) gives a different computation of the same result.</Paragraph>
    <Paragraph position="7"> To implement our simpler algorithm, we represent strings in [?][?] as pointers into a global trie that extends upon lookup. The strings are actually stored reversed in the trie so that it is fast to add and remove short prefixes. Over the extended alphabet, we use the pointer pair (k,m) to represent the string k[?]1m where k,m [?] [?][?] have no common prefix. Such pointer pairs can be equality-tested in O(1) time during merging. For k,m [?] [?][?], k [?]m is computed in time O(|k|), and k\m in time O(|LCP(k,m)|) or more loosely O(|k|) (where LCP = longest common prefix).</Paragraph>
    <Paragraph position="8"> The total time to compute our l(Fq) values is therefore O(|states|+t*|arcs|), where t is the maximum length of any arc's weight. For each arc we then compute a new weight as a left-quotient by a l value. So our total run-time for pushing is O(|states |+ |arcs |* maxq |l(Fq)|). This may appear identical to Mohri's runtime, but in fact our |l(Fq) |[?] Mohri's, though the two definitions share a worst case of t*|states|.24 Inverse letters must be eliminated from the minimized transducer if one wishes to pass it to any specialized algorithms (composition, inversion) that assume weights 23We define |e |= 1 to simplify the O(***) expressions.</Paragraph>
    <Paragraph position="9"> 24The |l(Fq) |term contributed by a given arc from q is a bound on the length of the LCP of the outputs of certain paths from q. Mohri uses all paths from q and we use just two, so our LCP is sometimes longer. However, both LCPs probably tend to be short in practice, especially if one bypasses LCP(k,k) with special handling for k\k = e.</Paragraph>
    <Paragraph position="10"> in [?][?]. Fortunately this is not hard. If state q of the result was formed by merging states q1,...qj, define r(q) = LCS{l(Fqi) : i = 1,...j} [?] [?][?] (where LCS = longest common suffix). Now push the minimized transducer using r(q)[?]1 in place of l(Fq) for all q. This corrects for &amp;quot;overpushing&amp;quot;: any letters r(q) that were unnecessarily pushed back before minimization are pushed forward again, cancelling the inverse letters. In our running example, state 0 will push (xyz)[?]1 back and the merged state {1,3} will push (yz)[?]1 back. This is equivalent to pushing r(0) = xyz forward through state 0 and the yz part of it forward through {1,3}, canceling the z[?]1y[?]1 at the start of one of the next arcs.</Paragraph>
    <Paragraph position="11"> We must show that the resulting labels really are free of inverse letters. Their values are as if the original pushing had pushed back not l(Fqi) [?] [?][?] but only its shorter prefix ^l(qi) def= l(Fqi)/r(qi) [?] [?][?] (note the right quotient). In other words, an arc from qi to riprime with weight k [?] [?][?] was reweighted as ^l(qi)\(k [?] ^l(riprime)). Any inverse letters in such new weights clearly fall at the left. So suppose the new weight on the arc from q to r begins with an inverse letter z[?]1. Then ^l(qi) must have ended with z for each i = 1,...j. But then r(qi) was not the longest common suffix: zr(qi) is a longer one, a contradiction (Q.E.D.).</Paragraph>
    <Paragraph position="12"> Negative weights can be similarly eliminated after minimization over the tropical semiring, if desired, by substituting min for LCS.</Paragraph>
    <Paragraph position="13"> The optional elimination of inverse letters or negative weights does not affect the asymptotic runtime. A caveat here is that the resulting automaton no longer has a canonical form. Consider a straight-line automaton: pushing yields a canonical form as always, but inverseletter elimination completely undoes pushing (^l(qi) = e). This is not an issue in Mohri's approach.</Paragraph>
  </Section>
  <Section position="9" start_page="5" end_page="5" type="metho">
    <SectionTitle>
8 Conclusion and Final Remarks
</SectionTitle>
    <Paragraph position="0"> We have characterized the semirings over which weighted deterministic automata can be minimized (section 4), and shown how to perform such minimization in both general and specific cases (sections 5, 6, 7). Our technique for division semirings and their subsemirings pushes back, at each state q, the output of a single, easily found, shortest accepting path from q. This is simpler and more general than previous approaches that aggregate all accepting paths from q.</Paragraph>
    <Paragraph position="1"> Our new algorithm (section 6) is most important for previously unminimizable, practically needed division semirings: real (e.g., for probabilities), expectation (for learning (Eisner, 2002)), and additive with negative weights (for conditional random fields (Lafferty et al., 2001)). It can also be used in non-division semirings, as for transducers. It is unpatented, easy to implement, comparable or faster in asymptotic runtime, and perhaps faster in practice (especially for the tropical semiring, where it seems preferable in most respects).</Paragraph>
    <Paragraph position="2"> Our approach applies also to R-weighted sequential transducers as in (Cortes et al., 2002). Such automata can be regarded as weighted by the product semiring (Rx [?][?],(+,min),(x,concat)). Equivalently, one can push the numeric and string components independently.</Paragraph>
    <Paragraph position="3"> Our new pushing algorithm enables not only minimization but also equivalence-testing in more weight semirings. Equivalence is efficiently tested by pushing the (deterministic) automata to canonicalize their arc labels and then testing unweighted equivalence (Mohri, 1997).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML