File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/p02-1023_evalu.xml
Size: 2,263 bytes
Last Modified: 2025-10-06 13:58:54
<?xml version="1.0" standalone="yes"?> <Paper uid="P02-1023"> <Title>Improving Language Model Size Reduction using Better Pruning Criteria</Title> <Section position="9" start_page="111" end_page="111" type="evalu"> <SectionTitle> 5.3 Results </SectionTitle> <Paragraph position="0"> We used the same training data described in Section 4 for bigram model training. We divided the test set described in Section 4 into two non-overlapped subsets. We performed testing on one subset containing 80% of the test set. We performed optimal function learning using the remaining 20% of the test set (referred to as held-out data below).</Paragraph> <Paragraph position="1"> Take the combination of rank and entropy as an example. An uncompressed bigram model was first built using all training data. We then built a very large number of pruned bigram models using [?] [3E-12, 3E-6]. By evaluating pruned models on the held-out data, optimal settings can be found. Some sample settings are shown in In experiments, we found that a linear regression model of Equation (6) is powerful enough to learn a function which is close enough to the optimal one. are coefficients estimated from the sample settings. Optimal functions of the other two In Figure 5, we present the results using models pruned with all three threshold-pairs defined by the functions in Table 4. As we expected, in all three cases, using a combination of two pruning criteria achieves consistently better performance than using either of the criteria separately. In particular, using the combination of rank and entropy, we obtained the best models over a wide large of CER values. It corresponds to a significant size reduction of 15-54% over the probability-based LM pruning at the same CER. An example of the detailed comparison results is shown in Table 5.</Paragraph> <Paragraph position="2"> There are two reasons for the superior performance of the combination of rank and entropy. First, the rank-based pruning achieves very good performance as described in Section 4. Second, as shown in Section 5.1, there is a relatively small overlap between the bigrams chosen by these two pruning criteria, thus big improvement can be achieved through the combination.</Paragraph> </Section> class="xml-element"></Paper>