File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/w00-0709_evalu.xml
Size: 4,122 bytes
Last Modified: 2025-10-06 13:58:40
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-0709"> <Title>Overfitting Avoidance for Stochastic Modeling of Attribute-Value Grammars</Title> <Section position="7" start_page="51" end_page="52" type="evalu"> <SectionTitle> 5 Results </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="51" end_page="51" type="sub_section"> <SectionTitle> 5.1 Performance of unmerged models </SectionTitle> <Paragraph position="0"> Of the unmerged models, as expected, the one trained on the smaller set shows the worst performance and most drastic overfitting. Its peak at approximately 42.5% performance comes early, at around 20 iterations of IIS, and subsequently drops to 40.5% at around 50 iterations. At around 80 iterations, it plunges to about 39%, where it remains. This model's performance may be seen in figure 2 represented by the solid black line.</Paragraph> <Paragraph position="1"> In figure 3, the solid black line represents the original model trained on 4600 sentences. The feature set is the same, although in this case all of the 38,057 features are active. The advantage of having so much more training data is evident.</Paragraph> <Paragraph position="2"> The performance peaks at a much higher level and overfitting, although present, is much less drastic at the end of 150 iterations. Nevertheless, the curve still reaches a maximum point fairly early on, at about 40 iterations, and the performance diminishes from there.</Paragraph> </Section> <Section position="2" start_page="51" end_page="52" type="sub_section"> <SectionTitle> 5.2 Performance of merged models </SectionTitle> <Paragraph position="0"> Different cutoffs yielded varying degrees of improvement. A cutoff of 100 elements seemed to make no meaningful difference either way with either model. Increasing the cutoff for the 498 sentence-trained model both lowered the peak before 40 iterations and raised the dip after 80 in a fairly regular fashion. The best balance seemed to be struck with a cutoff of 1250. In this case the number of active features in the model was reduced to 4801.</Paragraph> <Paragraph position="1"> As can be seen from figure 2, the merged model, represented by the dotted line, shows a much more predictable improvement, its curve much closer to the optimal asymptotic improvement. In terms of actual performance, the early peak of the unmerged model is not present at all in the merged model, which catches up between around 40 and 80 iterations. After 80 iterations, the merged model begins to outperform the unmerged model, which has begun to suffer from tences, features containing elements appearing fewer than 1250 times are merged. The early peak of the unmerged model gives way to drastic overfitting. The merged model, on the other hand, does not reach this peak, but overfitting is not present.</Paragraph> <Paragraph position="2"> severe overfitting. The merged model, on the other hand, shows no evidence of overfitting.</Paragraph> <Paragraph position="3"> Likewise, the merged model represented by the dotted line in figure 3 shows no overfitting either, an improvement in that regard over its unmerged counterpart. For this model, the best cutoff of those tried appeared to be 500, and the number of active features was reduced to 77,286. Higher cutoffs led to slower rates of improvement and lower levels of performance.</Paragraph> <Paragraph position="4"> Both merging operations may be viewed as yielding improvements over the unmerged models, as the accuracy of the model should ideally increase with each iteration of the IIS algorithm until it converges. It is likely that further iterations would yield even more clear improvement, although it is also possible that the merged models themselves would begin to exhibit overfitting after some point. The rate of increase in performance and the point of onset of overfitting varies from model to model. In general, predictable improvement, even if gradual, is preferable to sporadic peaking and drastic overfitting. This may not always be the case in practice.</Paragraph> </Section> </Section> class="xml-element"></Paper>