File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-2020_evalu.xml

Size: 2,588 bytes

Last Modified: 2025-10-06 13:59:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2020">
  <Title>Topic-Focused Multi-document Summarization Using an Approximate Oracle Score</Title>
  <Section position="10" start_page="157" end_page="158" type="evalu">
    <SectionTitle>
8 Results
</SectionTitle>
    <Paragraph position="0"> Figure 4 gives the ROUGE-2 scores with error bars for the approximations of the oracle score as well as the ROUGE-2 scores for the human summarizers and the top performing systems at DUC 2005. In the graph, qs is the approximate oracle, qs(p) is the approximation using linguistic preprocessing, and qs(pr) is the approximation with both linguistic preprocessing and redundancy removal.</Paragraph>
    <Paragraph position="1"> Note that while there is some improvement using the linguistic preprocessing, the improvement using our redundancy removal technique is quite minor. Regardless, our system using signature terms and query terms as estimates for the oracle score performs comparably to the top scoring system at DUC 05.</Paragraph>
    <Paragraph position="2"> Table 3 gives the ROUGE-2 scores for the recent DUC 06 evaluation which was essentially the same task as for DUC 2005. The manner in which the linguistic preprocessing is performed has changed from DUC 2005, although the types of removals have remained the same. In addition, pseudo-relevance feedback was employed for redundancyremovalasmentionedearlier. See(Conroy et. al. 2005) for details.</Paragraph>
    <Paragraph position="3"> While the main focus of this study is task-oriented multidocument summarization, it is instructive to see how well such an approach would perform for a generic summarization task as with the 2004 DUC Task 2 dataset. Note, the o score for generic summaries uses only the signature term portion of the score, as no topic description is given. We present ROUGE-1 (rather than</Paragraph>
    <Section position="1" start_page="157" end_page="158" type="sub_section">
      <SectionTitle>
Humans A-I
</SectionTitle>
      <Paragraph position="0"> ROUGE-2) scores with stop words removed for comparison with the published results given in (Nenkova and Vanderwende, 2005).</Paragraph>
      <Paragraph position="1"> Table 4 gives these scores for the top performing systems at DUC04 as well as SumBasic and o(pr)qs , the approximate oracle based on signature terms alone with linguistic preprocess trimming and pivot QR for redundancy removal. As displayed, o(pr)qs scored second highest and within the 95% confidence intervals of the top system, peer 65, as well as SumBasic, and peer 34.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML