File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/p96-1031_intro.xml
Size: 4,145 bytes
Last Modified: 2025-10-06 14:06:09
<?xml version="1.0" standalone="yes"?> <Paper uid="P96-1031"> <Title>An Efficient Compiler for Weighted Rewrite Rules</Title> <Section position="4" start_page="235" end_page="236" type="intro"> <SectionTitle> 5. Experiments </SectionTitle> <Paragraph position="0"> In order to compare the performance of the Mgorithm presented here with KK, we timed both algorithms on the compilation of individual rules taken from the following set (k * \[0, 10\]):</Paragraph> <Paragraph position="2"> In other words we tested twenty two rules where the left context or the right context is varied in length from zero to ten occurrences of c. For our experiments, we used the alphabet of a realistic application, the text analyzer for the Bell Laboratories German text-to-speech system consisting of 194 labels. All tests were run on a Silicon Graphics IRIS Indigo 4000, 100 MhZ IP20 Processor, 128 Mbytes RAM, running IRIX 5.2. Figure 9 shows the relative performance of the two algorithms for the left context: apparently the performance of both algorithms is roughly linear in the length of the left context, but KK has a worse constant, due to the larger number of operations involved. Figure 10 shows the equivalent data for the right context. At first glance the data looks similar to that for the left context, until one notices that in Figure 10 we have plotted the time on a log scale: the KK algorithm is hyperexponential.</Paragraph> <Paragraph position="3"> What is the reason for this performance degradation in the right context? The culprits turn out to be the two intersectands in the expression of Rightcontext(p, <, >) in Figure 1. Consider for example the righthand intersectand, namely ~0 > P>0~0- > ~0, which is the complement of ~0 > P>0~0- > ~0- As previously indicated, the complementation Mgorithm. requires determinization, and the determinization of automata representing expressions of the form ~*a, where c~ is a regular expression, is often very expensive, specially when the expression a is already complex, as in this case.</Paragraph> <Paragraph position="4"> Figure 11 plots the behavior of determinization on the expression Z~0 > P>0Z~0- > ~0 for each of the rules in the set a ~ b/__c k, (k e \[0, 10\]). On the horizontal axis is the number of arcs of the non-deterministic input machine, and on the vertical axis the log of the number of arcs of the deterministic machine, i.e. the machine result of the determinization algorithm without using any minimization. The perfect linearity indicates an exponential time and space behavior, and this in turn explains the observed difference in performance. In contrast, the construction of the right context machine in our algorithm involves only the single determinization of the automaton representing ~*p, and thus is much less expensive. The comparison just discussed involves a rather artificiM ruleset, but the differences in performance that we have highlighted show up in real applications. Consider two sets of pronunciation rules from the Bell Laboratories German text-to-speech system: the size of the alphabet for this ruleset is 194, as noted above. The first ruleset, consisting of pronunciation rules for the orthographic vowel <5> contains twelve rules, and the second ruleset, which deals with the orthographic a ~ b/ c k, (k E \[0, 10\]).</Paragraph> <Paragraph position="5"> vowel <a> contains twenty five rules. In the actual application of the rule compiler to these rules, one compiles the individual rules in each ruleset one by one, and composes them together in the order written, compacts them after each composition, and derives a single transducer for each set. When done off-line, these operations of compo- null sition and compaction dominate the time corresponding to the construction of the transducer for each individual rule. The difference between the two algorithms appears still clearly for these two sets of rules. Table 1 shows for each algorithm the times in seconds for the overall construction, and the number of states and arcs of the output transducers.</Paragraph> </Section> class="xml-element"></Paper>