XML Viewer - w98-1306

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-1306_metho.xml
Size: 17,241 bytes
Last Modified: 2025-10-06 14:15:14
<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-1306">
  <Title>Treatment of ~-Moves in Subset Construction</Title>
  <Section position="3" start_page="57" end_page="58" type="metho">
    <SectionTitle>
2 FSA Utilities
</SectionTitle>
    <Paragraph position="0"> The FSA Utilities tool-box is a collection of tools to manipulate regular expressions, finite-state automata and finite-state transducers (both string-to-string and string-to-weight transducers).</Paragraph>
    <Paragraph position="1"> Manipulations include determirtisation (both for finite-state acceptors and finite-state transducers), minimisation, composition, complementation, intersection, Kleene closure, etc. Various visualisation tools are available to browse finite-state automata. The tool-box is implemented in SICStus Prolog.</Paragraph>
    <Paragraph position="2"> The motivation for the FSA Utilities tool-box has been the rapidly growing interest for finite-state techniques in computational linguistics. The FSA Utilities tool-box has been developed to</Paragraph>
    <Paragraph position="4"> - Construction of finite automata on the basis of regular expressions. Regular expressiorl operators include concatenation, Kleene closure, union and option (the standard regular expression operators). Furthermore the extended regular expression operators are provided: complement, difference and &amp;quot;.intersection. Symbols can be intervals of symbols, or the 'Any'variable which matches any symbol. Regular expression operators are provided for operations on the underlying automaton, including minimisation and determinisation. FinaUy, we support user-defined regular expression operators.</Paragraph>
    <Paragraph position="5"> - We also provide operators for transductions such as composition, cross-product, samelength-cross-product, domain, range, identity and in~cersion.</Paragraph>
    <Paragraph position="6"> - Determinisation and Minimisation. Three different minimisation algorithms are supported: Hopcroft's algorithm (Hopcroft, 1971), Hopcroft and Ullmart's algorithm (Hopcroft and Ullman, 1979), and Brzozowski's algorithm (Brzozowski, 1962).</Paragraph>
    <Paragraph position="7"> - Determinisation and minimisation of string-to-string and string-to-weight transducers (Mohri, 1996; Mohri, 1997).</Paragraph>
    <Paragraph position="8"> - Visuuli~tion. Support includes built-in visualisation (TCl/Tk, TeX+PicTeX, TeX+PsTricks, Postscript) and interfaces to third party graph visualisation software (Graphviz (dot), VCG, daWmci).</Paragraph>
    <Paragraph position="9"> - Random generation of finite automata (an extension of the algorithm in Leslie (1995) to allow the generation of finite automata containing e-moves).</Paragraph>
  </Section>
  <Section position="4" start_page="58" end_page="60" type="metho">
    <SectionTitle>
3 Subset Construction
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="58" end_page="60" type="sub_section">
      <SectionTitle>
3.1 Problem statement
</SectionTitle>
      <Paragraph position="0"> Let a finite-state machine M be specified by a tuple (Q, 22, 6, S, F) where Q is a finite set of states, is a finite alphabet, 6 is a function from Q x (27 u {e} ) --* 2 Q. Furthermore, S c Q is a set of start states 4 and F C Q is a set of final states.</Paragraph>
      <Paragraph position="1"> Let e-move be the relation {(qi, qJ)lqj E $(qi, e)}. e-reachable is the reflexive and transitive closure of e-move. Let e-CLOSURE: 2 Q --, 2 Q be a function which is defined as: e-CLOSURE(Q') = {qlq' fi Q', (q', q) e e-reachable) For any given finite-state automaton M = (Q, ~, 6, S, F) there is an equivalent deterministic automaton M' = (2 Q, 27, 6', {Q0}, F'). F' is the set of all states in 2 Q containing a final state of M, i.e., the set of subsets {Q~ e 2Ctlq E Qi, q E F}. M' has a single start state Q0 which is the epsilon closure of the start states of M, i.e., Q0 = e-CLOSURE(S). Finally, C/({q~, q2,..., qd, a) = ~'LOSUREC6(q~, ~) U ~(q2, a) U... U ~(q~, a)) An algorithm which computes M' for a given M will only need to take into account states in 20 which are reachable from the start state Q0. This is the reason that for many input automata the algorithm does not need to treat all subsets of states (but note that there are automata for which all subsets are relevant, and hence exponential behaviour cannot be avoided in general).</Paragraph>
      <Paragraph position="2"> Consider the subset construction algorithm in figure 1. The algorithm maintains a set of subsets States. Each subset can be either marked or unmarked (to indicate whether the sub-set has been treated by the algorithm); the set of unmarked subsets is sometimes referred to 4 Note that a set of start states is required, rather than a single start state. Many operations onautomata can be defined somewhat more elegantly in this way. Obviously, for deterministic automata this set should be a singleton set.</Paragraph>
      <Paragraph position="4"> while there is an unmarked subset T E States do</Paragraph>
      <Paragraph position="6"> mtum (States, E, rrans, {Start}, P#~) end proc add (U) Reachable-state-set Maintenance ifU~ States then add U unmarked to States if U N F then F/na/s := F/na/s U U fi</Paragraph>
      <Paragraph position="8"> as the agenda. The algorithm takes such an unmarked subset T and computes all transitions leaving T. This computation is performed by the function instructions and is called instruction computation by Johson and Wood (1997).</Paragraph>
      <Paragraph position="9"> The function index_transitions constructs the function transitions : Q -~ 2~ x 2Q. This function returns for a given state p the set of pairs (s, T) representing the transitions leaving p. Furthermore, the function merge takes such a set of pairs and merges all pairs with the same first element (by taking the union of the corresponding second elements). For example: me e({(a { 2 4}) (b {2 4}) (a {3 4}) (b {5 6})})={(a { 2 3;4}) (b {2 4 5 6 )} The procedure add is responsible for &amp;quot;reachable-state-set maintenance', by ensuring that target subsets are added to the set of subsets if these subsets were not encountered before. Moreover, if such a new subset contains a final state, then this subset is added to the set of final states.</Paragraph>
      <Paragraph position="11"/>
    </Section>
  </Section>
  <Section position="5" start_page="60" end_page="61" type="metho">
    <SectionTitle>
4 Three Variants for e-Moves
</SectionTitle>
    <Paragraph position="0"> The algorithm presented in the previous section does not treat e-moves. In this section three possible extensions of the algorithm are identified to treat e-moves.</Paragraph>
    <Section position="1" start_page="60" end_page="60" type="sub_section">
      <SectionTitle>
4.1 Per graph
</SectionTitle>
      <Paragraph position="0"> This variant can be seen as a straightforward implementation of the constructive proof that for any given automaton with e-moves there is an equivalent one without e-moves (Hopcroft and Ullman, 1979)\[page 26-27\].</Paragraph>
      <Paragraph position="1"> For a given M = (Q,2~,6,S,F) tl~ variant first computes M' = (Q,2~,6',S',F), where S' = e-CLOSURE(S), and ~'(q, a) = e-CLOSURE(5(q, a)). The function e-CLOSURE is computed by using a standard transitive closure algorithm for directed graphs: this algorithm is applied to the directed graph consisting of all e-moves of M. Such an algorithm can be found in several textbooks (see, for instance, Cormen, Leiserson, and Rivest (1990)).</Paragraph>
      <Paragraph position="2"> The advantage of this approach is that the subset construction algorithm does not need to be modified at all. Moreover, the transitive closure algorithm is fired only once (for the full graph), whereas the following two variants call a spedalised transitive closure algorithm possibly many times.</Paragraph>
    </Section>
    <Section position="2" start_page="60" end_page="61" type="sub_section">
      <SectionTitle>
4.2 Per subset and per state
</SectionTitle>
      <Paragraph position="0"> The pet subset and the per state algorithm use a variant of the transitive closure algorithm for graphs. Instead of computing the transitive closure of a given graph, this algorithm only computes the closure for a given set of states. Such an algorithm is given in figure2.</Paragraph>
      <Paragraph position="1"> funct c/osure(T) D=: 0 foreach t E T do add t unmarked to D od while there is an unmarked state t E D do</Paragraph>
      <Paragraph position="3"> In either of the two integrated approaches, the subset construction algorithm is initialised with an agenda containing a single subset which is the e-CLOSDRE of the set of start-states of the input; furthermore, the way in which new transitions are computed also takes the effect of ~-moves into account. Both differences are accounted for by an alternative definition of the epsilon_closure function.</Paragraph>
      <Paragraph position="5"> The approach in which the transitive closure is computed for one state at a time is defined by the following definition of the epsilon_closure function. Note that we make sure that the transitive closure computation is only performed once for each input state, by memorising the</Paragraph>
      <Paragraph position="7"> variant 2: per state In the case of the per subset approach the closure algorithm is applied to each subset. We also memorise the closure function, in order to ensure that the closure computation is performed only once for each subset. This can be useful since the same subset can be generated many times during subset construction. The definition simply is: funct epsilon_dosure(U) return memo ( d osure ( U ) ) end variant 3: per subset The motivation for per state approach may be the insight that in this case the closure algorithm is called at most IQ\] times. In contrast, in the per subset approach the transitive closure algorithm may need to be called 2 IQI times. On the other hand, in the per state approach some overhead must be accepted for computing the union of the results for each state. Moreover, in practice the number of subsets is often much smaller than 21QI. In some cases, the number of reachable subsets is smaller than the number of states encountered in those subsets.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="61" end_page="68" type="metho">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"> Two sets of experiments have been performed. In the first set of experiments a number of random automata is generated according to a number of criteria (based on Leslie (1995)). In the second set of experiments, results are provided for a number of (much larger) automata that surfaced during actual development work on finite-state approximation techniques.</Paragraph>
    <Paragraph position="1"> Random automata. Firstly, consider a number of experiments for randomly generated automata.</Paragraph>
    <Paragraph position="2"> Following Leslie (1995), the absolute transition density of an automaton is defined as the number of transitions divided by the square of the number of states times the number of symbols (i.e.</Paragraph>
    <Paragraph position="3"> the number of transitions divided by the number of possible transitions). Deterministic transition density is the number of transitions divided by the number of states times the number of symbols (i.e. the ratio of the number of transitions and the number of possible trans~'ons in a deterministic machine). Leslie (1995) shows that deterministic transition density is a reliable measure for the difficulty of subset construction. Exponential blow-up can be expected for input automata with deterministic transition density of around 2. 5 A number of automata were generated randomly, according to the number of states, symbols, and transition density. The random generator makes sure that all states are reachable from the start state. For the first experiment, a number of automata was randomly generated, consisting of 15 symbols, and 15, 20, 25, 100 or 1000 states, using various densities (and no e-moves).</Paragraph>
    <Paragraph position="5"> The results are summarised in figure 3. Only a single result is given since each of the implementations works equally well in the absence of e-moves. 8 A new concept called absolute jump density is introduced to specify the number of c-moves. It is defined as the number of e-moves divided by the square of the number of states (i.e., the probability that an e-move exists for a given pair of states). Furthermore, deterministic jump density is the number of e-moves divided by the number of states (i.e., the average number of ~-moves which leave a given state). In order to measure the differences between the three implementations, a number of automata has been generated consisting of 15 states and 15 symbols, using various transition densities between 0.01 and 0.3 (for larger densities the automata tend to collapse to an automaton for 27*). For each of these transition densities, jump densities were chosen in the range 0.01 to 0.24 (again, for larger values the automaton collapses). In figure 4 the outcomes of this experiment are summarised by listing the average amount of CPU-time required per deterministic jump density (for each of the three algorithms). Thus, every dot represents the average for determinising a number of different input automata with various absolute transition densities and the same deterministic jump densi~. The figures 5, 6 and 7 summarise similar experiments using input automata with 20, 25 and 100 states, z The striking aspect of these experiments is that the per graph algorithm is more efficient for lower deterministic jump densities, whereas, if the deterministic jump density gets larger, the per subset algorithm is more efficient. The turning point is around a deterministic jump density between I and 1.5~ where it seems that for larger automata the turning point occurs at a 'lower determinisic jump density. Interestingly, this generalisation is supported by the experiments on automata which were generated by approximation techniques (although the results for randomly generated automata are more consistent than the results for &amp;quot;real' examples).</Paragraph>
    <Paragraph position="6"> Experiment: Automata generated by approximation algorithms The automata used in the previous experiments were randomly generated, according to a number of criteria. However, it may well be that in practice the automata that are to be treated by the algorithm have typical properties which were not reflected in this test data. For this reason results are presented for a number of automata that were generated using approximation techniques for context-free grammars (Pereira and Wright, 1997; Nederhof, 1997; Evans, 1997). In particular, a number of automata has been used generated by Mark-Jan Nederhof using the technique described in Nederhof (1997). In addition, a small number of automata have been used which were generated using the technique of Pereira and Wright (1997) (as implemented by Nederhof).</Paragraph>
    <Paragraph position="7"> The automata typically contain lots of jumps. Moreover, the number of states of the resulting automaton is often smaller than the number of states in the input automaton, Results are given in table 1. One of the most striking examples is the ygrim automaton consisting of 3382 states CPU-time was measured on a HP 9000/780 machine running HP-UX 10.20, 240Mb, with SICStus Prolog 3 #3. For comparison with an &amp;quot;industrial strength&amp;quot; implementation, we have applied the determiniser of AT&amp;T's FSM utilities for the same examples. The results show that for automata with very small transition densities FSM is faster (up to 2 Or 3 times as fast), but for automata with larger densities the results are very similar, in some cases our Prolog implementation is even faster. Note finally that our timings do include IO, but not the start-up of the Prolog engine.</Paragraph>
    <Paragraph position="8"> We also provide the results for FSM again; we used the pipe fsmrmepsilon I fsmdeterminize * According to Fernando Pereira (pc) the comparison is less meaningful in this case because the fsmrmepsilon program treats weighted automata. This generalisation requires some overhead also in case no weights are used (for the determiniser this generalisation does not lead to any significant overhead). Pereira mentions furthermore that FSM used to include a determiniser with integrated treatment of jumps. Because this version could not (easily) be generalised for weighted automata it was dropped from the tool-set.</Paragraph>
    <Paragraph position="10"> and 10571 jumps. For this example, the per graph implementation ran out of memory (after a long time), whereas the per subset algorithm produced the determinised automaton relatively quickly. The FSM implementation took much longer for this example (whereas for many of the other examples it performs better than our implementations). Note that this example has the highest number of jumps per number of states ratio.</Paragraph>
    <Paragraph position="11"> input automaton  that the corresponding algorithm ran out of memory (after a long period of time) for that particular example. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML