File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/n04-1019_concl.xml
Size: 1,819 bytes
Last Modified: 2025-10-06 13:54:04
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-1019"> <Title>Evaluating Content Selection in Summarization: The Pyramid Method</Title> <Section position="7" start_page="9" end_page="9" type="concl"> <SectionTitle> 5 Conclusions </SectionTitle> <Paragraph position="0"> There are many open questions about how to parameterize a summary for specific goals, making evaluation in itself a significant research question (Jing et al., 1998).</Paragraph> <Paragraph position="1"> Instead of attempting to develop a method to elicit reliable judgments from humans, we chose to calibrate our method to human summarization behavior.</Paragraph> <Paragraph position="2"> The strengths of pyramid scores are that they are reliable, predictive, and diagnostic. The pyramid method not only assigns a score to a summary, but also allows the investigator to find what important information is missing, and thus can be directly used to target improvements of the summarizer. Another diagnostic strength is that it captures the relative difficulty of source texts. This allows for a fair comparison of scores across different input sets, which is not the case with the DUC method.</Paragraph> <Paragraph position="3"> We hope to address two drawbacks to our method in future work. First, pyramid scores ignore interdependencies among content units, including ordering. However, our SCU annotated summaries and correlated pyramids provide a valuable data resource that will allow us to investigate such questions. Second, creating an initial pyramid is laborious so large-scale application of the method would require an automated or semi-automated approach.</Paragraph> <Paragraph position="4"> We have started exploring the feasibility of automation and we are collecting additional data sets.</Paragraph> </Section> class="xml-element"></Paper>