File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/w00-0714_evalu.xml
Size: 3,376 bytes
Last Modified: 2025-10-06 13:58:40
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-0714"> <Title>Using Perfect Sampling in Parameter Estimation of a Whole Sentence Maximum Entropy Language Model*</Title> <Section position="5" start_page="80" end_page="80" type="evalu"> <SectionTitle> 4 Experimental work </SectionTitle> <Paragraph position="0"> In this work, we have made preliminary experiments using PS in the estimation of the expected value (4) during the learning of the parameters of a WSME model. We have implemented the Cai algorithm (Cai, 1999) to obtain perfect samples. The Cai algorithm has the advantage that it doesn't need the definition of the partial order.</Paragraph> <Paragraph position="1"> The experiments were carried out using a pseudonatural corpus: &quot;the traveler task &quot;1. The traveler task consists in dialogs between travelers and hotel clerks. The size of the vocabulary is 693 words. The training set has 490,000 sentences and 4,748,690 words. The test set has 10,000 sentences and 97,153 words.</Paragraph> <Paragraph position="2"> Three kinds of features were used in the WSME model: n-grams (1-grams, 2-grams, 3grams), distance 2 n-grams (d2-2-grams, d2-3grams) and triggers. The proposal prior distribution used was a trigram model.</Paragraph> <Paragraph position="3"> We trained WSME models with different sets of features using the two sampling techniques: MCMC and PS. We measured the perplexity (PP) of each of the models and obtained the percentage of improvement in the PP with respect to a trigram base-line model (see table 1). The first model used MCMC techniques (specifically the Independence Metropolis-Hastings algorithm (IMH) 2) and features of n-grams and distance 2 n-grams. The second model used a model over the traveler task corpus: IMH with features of n-grams and d-n-grams (IMH), PS with n-grams and d-n-grams (PS) IMH with triggers (IMH-T), PS with triggers (PS-T). The base-line model is a trigram model (Trigram) PS algorithm and features of n-grams and distance 2 n-grams. The third model used the IMH algorithm and features of triggers. The fourth used PS and features of triggers. Finally, in order to compare with the classical methods, we included the trigram base-line model.</Paragraph> <Paragraph position="4"> In all cases, the WSME had a better performance than the n-gram model. From the results in Table 1, we see that the use of features of triggers improves the performance of the model more than the use of n-gram features, this may be due to the correlation between the triggers and the n-grams, the n-gram information has been absorbed by the prior distribution and diminishes the effects of the feature of n-grams. We believe this is the reason why PS-T in Table 1 is better than PS. We also see how IMH and IHM-T shows the same improvement, i.e.</Paragraph> <Paragraph position="5"> the use of triggers does not seem improve the perplexity of the model but, this may be due to the sampling technique: the parameter values depends on the estimation of an expected value, and the estimation depends on the sampling. Finally, the PS-T has better perplexity than the IMH-T. The only difference between both of these is the sampling technique,neither of then has the correlation influence in the features, so we think that the improvement may be due to the sampling technique.</Paragraph> </Section> class="xml-element"></Paper>