File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-1101_evalu.xml
Size: 2,808 bytes
Last Modified: 2025-10-06 13:59:05
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1101"> <Title>Improving Summarization Performance by Sentence Compression - A Pilot Study</Title> <Section position="7" start_page="0" end_page="0" type="evalu"> <SectionTitle> 6 Results </SectionTitle> <Paragraph position="0"> Tables 1 and 2 summarize the results. Analyzing all runs according to these two tables, we made the following observations.</Paragraph> <Paragraph position="1"> (1) Selecting compressed sentences using length-adjusted scores (K&M) without any modification performed significantly worse (at a = 5%, table cells marked in dark gray in Table 2) than all other runs. This indicates we cannot rely on pure syntactic-based compression to improve overall system performance although the compression algorithm performed well in the individual sentence level.</Paragraph> <Paragraph position="2"> (2) The original run (ORG) achieved an average unigram co-occurrence score of 0.253 and was significantly better than all other runs except the ORACLE and SIGKMb runs. This result was a little bit discouraging; it means that no/most reranking is not useful, and indicates that we need to invest more time in finding a better way to rank the compressed sentences. Pure syntactic (noisy-channel model), shallow semantic (by topic signatures), or simple combinations of them did not improve system performance and in some cases even degraded it.</Paragraph> <Paragraph position="3"> (3) Comparing the ORACLE (0.287) run with the average human performance of 0.270 (not shown in the Tables), we should remain optimistic about finding a better ranking algorithm to select the best compression. However, the low human performance posts a challenge for machine learning algorithms to learn this function. We provided more in-depth discussion of this issue in other papers (Lin and Hovy, 2002; Lin and Hovy 2003b).</Paragraph> <Paragraph position="4"> (4) That the ORACLE run did not achieve higher score also implied the following: a. The sentence compression algorithm that we used might drop some important content. Therefore the compressed summaries did not achieve 20% increase in performance as Figure 1 might suggest when systems were allowed to output 100% longer summary than the given constraint (i.e. if a 100-word summary is requested, a system can provide a 200-word summary in response.) b. The way we generated our compressed summaries was not effective. We might need to optimize and select compressions according to a global optimization function.</Paragraph> <Paragraph position="5"> For example, if some important content is mentioned in sentences already included in a summary, we would want to take this into account and to add compressions with new information to the final summary.</Paragraph> </Section> class="xml-element"></Paper>