File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-0502_concl.xml
Size: 2,246 bytes
Last Modified: 2025-10-06 13:53:40
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0502"> <Title>Sub-event based multi-document summarization</Title> <Section position="12" start_page="3" end_page="3" type="concl"> <SectionTitle> 11. Conclusions </SectionTitle> <Paragraph position="0"> While the Lead-based policy from our first experiment still outperforms all of our automatic cluster-based summaries at the 10% and 20% levels, our findings about SAS are important for future efforts to summarize by partitioning.</Paragraph> <Paragraph position="1"> As discussed, the pyramid structure of news articles may have boosted the scores of the lead-based policy. In applications of summarizers, where the information is not presorted, we believe that clustering and then extraction with SAS could offer the best results.</Paragraph> <Paragraph position="2"> We conclude that multi-document summarization is improved by two specific elements. Firstly, taking into account varying degrees of relevancy, as opposed to a polarized relevant/non-relevant metric. Secondly, recognizing the sub-events that comprise a single news event is essential.</Paragraph> <Paragraph position="3"> 12. Future Work In future work, we see four areas for improvement. We would like to improve our simple cluster-based algorithm. Hatzivassiloglou et al. (2001) have shown several ways of doing this. Second, we would like to have human judges evaluate the final summaries and give scores based on how well the summary captures the most relevant parts of the document cluster and how well the summary avoids repetition. This would allow us to see how effective the RU method is as well as how well our summarizer is functioning. Third, we would like to run a machine learning algorithm on a number of different and varied clusters to find which parameter settings work best for each type of cluster. We suspect that the optimal number of original clusters, and the choice of ISF or IDF, could be determined by the amount of redundancy in the cluster and the desired size of the extract, but more work remains to be done on this. Finally, we need to test the best clustering method against other methods -- centroid-based, MMR, lexical-chain, key-word to name a few.</Paragraph> </Section> class="xml-element"></Paper>