File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/04/n04-4001_relat.xml
Size: 1,865 bytes
Last Modified: 2025-10-06 14:15:46
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-4001"> <Title>Using N-Grams to Understand the Nature of Summaries</Title> <Section position="3" start_page="0" end_page="2" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> Jing (2002) previously examined the degree to which single-document summaries can be characterized as extractive. Based on a manual inspection of 15 human-written summaries, she proposes that for the task of single-document summarization, human summarizers use a &quot;cut-and-paste&quot; approach in which six main operations are performed: sentence reduction, sentence combination, syntactic transformation, reordering, lexical paraphrasing, and generalization or specification.</Paragraph> <Paragraph position="1"> The first four operations are reflected in the construction of an HMM model that can be used to decompose human summaries. According to this model, 81% of summary sentences contained in a corpus of 300 human-written summaries of news articles on telecommunications were found to fit the cut-and-paste method, with the rest believed to have been composed from scratch.</Paragraph> <Paragraph position="2"> Another recent study (Lin and Hovy, 2003) investigated the extent to which extractive methods may be sufficient for summarization in the single-document case. By computing a performance upper-bound for pure sentence extraction, they found that state-of-the-art extraction-based systems are still 15%-24% away from this limit, and 10% away from average human performance. While this sheds light on how much gain can be achieved by optimizing sentence extraction methods for single-document summarization, to our knowledge, no one has assessed the potential for extraction-based systems when attempting to summarize multiple documents.</Paragraph> </Section> class="xml-element"></Paper>