File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-1661_concl.xml
Size: 2,435 bytes
Last Modified: 2025-10-06 13:55:41
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1661"> <Title>Statistical Ranking in Tactical Generation</Title> <Section position="9" start_page="523" end_page="523" type="concl"> <SectionTitle> 5 Summary and Outlook </SectionTitle> <Paragraph position="0"> Applying three alternate statistical models to the realization ranking task, we found that discriminative models with access to structural information substantially outperform the traditional language model approach. Using comparatively small amounts of annotated training data, we were able to boost ranking performance from around BHBGB1 to more than BJBEB1, albeit for a limited, reasonably coherent domain and genre. The incremental addition of feature templates into the Max-Ent model suggests a trend of diminishing return, most likely due to increasing overlap in the portion of the problem space captured across templates, and possibly reflecting limitations in the amount of training data. The comparison of the Max-Ent and SVM rankers suggest comparable performance on our task, not showing statistically significant differences. Nevertheless, in terms of scalability when using large data sets, it seems clear that the MaxEnt framework is a more practical and manageable alternative, both in terms of training time and memory requirements.</Paragraph> <Paragraph position="1"> As further work we would like to try to train an SVM that takes full advantage of the ranking potential of the set-up described in (Joachims, 2002).</Paragraph> <Paragraph position="2"> Instead of just making binary (right/wrong) distinctions, we could grade the realizations in the training data according to their WA scores toward the references and try to learn a similar ranking.</Paragraph> <Paragraph position="3"> So far we have only been able to do preliminary experiments with this set-up on a small sub-set of the data. When evaluated with the accuracy measures used in this paper the results were not as good as those obtained when training with only two ranks, however this might very well look different if we evaluate the full rankings (e.g. number of swapped pairs) instead of just focusing on the top ranked candidates. Note that it is also possible to use such graded training data with the MaxEnt models, by letting the probabilities of the empirical distribution be based on similarity scores such as WA instead of frequencies.</Paragraph> </Section> class="xml-element"></Paper>