File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/p04-3023_metho.xml
Size: 6,214 bytes
Last Modified: 2025-10-06 14:09:07
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-3023"> <Title>On the Equivalence of Weighted Finite-state Transducers</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Incremental minimization </SectionTitle> <Paragraph position="0"> An application of this equivalence algorithm is the incremental minimization algorithm of (Watson and Daciuk, 2003). For every deterministic WFST T there exists at least one equivalent WFST M such that no other equivalent WFST has fewer states (i.e.</Paragraph> <Paragraph position="1"> |QM |is minimal). In the unweighted case, this means that there cannot be two distinct states that are equivalent in the minimized transducer.</Paragraph> <Paragraph position="2"> It follows that a way to build this transducer M is to compare every pair of distinct states in QA and merge pairs of equivalent states until there are no two equivalent states in the transducer. An advantage of this method is that at any time of the application of the algorithm, the transducer is in a consistent state; if the process has to finish under a certain time limit, it can simply be stopped (the number of states will have decreased, even though the minimality of the result cannot be guaranteed then).</Paragraph> <Paragraph position="3"> In the weighted case, merging two equivalent states is not as easy because edges with the same label may have a different weight. In figure 3, we see that states 1 and 2 are equivalent and can be merged, but outgoing transitions have different weights. The remainder weights have to be pushed to the following states, which can then be merged if they are equivalent modulo the remainder weights. This applies to states 3 and 4 here.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Generic Composition with Filter </SectionTitle> <Paragraph position="0"> As shown previously (Pereira and Riley, 1997), a special algorithm is needed for the composition of WFSTs. A filter is introduced, whose role is to handle epsilon transitions on the lower side of the top transducer and the upper side of the lower transducer (it is also useful in the unweighted case). In our implementation described in section 5 we have generalized the use of this epsilon-free composition operation to handle two operations that are defined on automata only, that is intersection and crossproduct. Intersection is a simple variant of the composition of the identity transducers corresponding to the operand automata.</Paragraph> <Paragraph position="1"> Cross-product uses the exact same algorithm but a different filter, shown in figure 4. The preprocessing stage for both operand automata consists of adding a transition with a special symbol x at every final state, going to itself, and with a weight of -1. This will allow to match words of different lengths, as when one of the automata is &quot;exhausted,&quot; the x symbol will be added as long as the other automaton is not. After the composition, the x symbol is replaced everywhere by epsilon1.</Paragraph> <Paragraph position="2"> matches any symbol; &quot;x&quot; is a special espilonsymbol introduced in the final states of the operand automata at preprocessing.</Paragraph> <Paragraph position="3"> The equivalence algorithm that is the subject of this paper is used in conjunction with composition of WFSTs in order to provide an iterative composition operator. Given two transducers A and B, it composes A with B, then composes the result with B again, and again, until a fixed-point is reached. This can be determined by testing the equivalence of the last two iterations. Roche and Schabes (1994) have shown that in the unweighted case this allows to parse context-free grammars with finite-state transducers; in our case, a cost can be added to the parse.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 A Prototype Implementation </SectionTitle> <Paragraph position="0"> The algorithms described above have all been implemented in a prototype weighted finite-state tool, called wfst, inspired from the Xerox tool xfst (Beesley and Karttunen, 2003) and the FSM library from AT&T (Mohri et al., 1997). From the former, it borrows a similar command-line interface and regular expression syntax, and from the latter, the addition of weights. The system will be demonstrated and should be available for download soon.</Paragraph> <Paragraph position="1"> The operations described above are all available in wfst, in addition to classical operations like union, intersection (only defined on automata), concatenation, etc. The regular expression syntax is inspired from xfst and Perl (the implementation language). For instance, the automaton of figure 3 was compiled from the regular expression (a/1 a/2 b/0* c/1) | (b/2 a/1 b/0* c/2) and the iterative composition of two previously defined WFSTs A and B is written $A %+ $B (we chose % as the composition operator, and + refers to the Kleene plus operator).</Paragraph> <Paragraph position="2"> Conclusion We demonstrate a simple and powerful experimental weighted finite state calculus tool and have described an algorithm at the core of its operation for the equivalence of weighted transducers. There are two major limitations to the weighted equivalence algorithm. The first one is that it works only on deterministic WFSTs; however, not all WFSTs can be determinized. An algorithm with backtracking may be a solution to this problem, but its running time would increase, and it remains to be seen if such an algorithm could apply to undeterminizable transducers. null The other limitation is that two transducers recognizing the same rational relation may have non-equivalent underlying automata, and some labels will not match (e.g. {a,epsilon1}{b,c} vs. {a,c}{b,epsilon1}). A possible solution to this problem is to consider the shortest string on both sides and have &quot;remainder strings&quot; like we have remainder weights in the weighted case. If successful, this technique could yield interesting results in determinization as well.</Paragraph> </Section> class="xml-element"></Paper>