File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-4001_intro.xml
Size: 3,345 bytes
Last Modified: 2025-10-06 14:02:17
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-4001"> <Title>Using N-Grams to Understand the Nature of Summaries</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The explosion of available online text has made it necessary to be able to present information in a succinct, navigable manner. The increased accessibility of worldwide online news sources and the continually expanding size of the worldwide web place demands on users attempting to wade through vast amounts of text.</Paragraph> <Paragraph position="1"> Document clustering and multi-document summarization technologies working in tandem promise to ease some of the burden on users when browsing related documents.</Paragraph> <Paragraph position="2"> Summarizing a set of documents brings about challenges that are not present when summarizing a single document. One might expect that a good multi-document summary will present a synthesis of multiple views of the event being described over different documents, or present a high-level view of an event that is not explicitly reflected in any single document. A useful multi-document summary may also indicate the presence of new or distinct information contained within a set of documents describing the same topic (McKeown et. al., 1999, Mani and Bloedorn, 1999). To meet these expectations, a multi-document summary is required to generalize, condense and merge information coming from multiple sources.</Paragraph> <Paragraph position="3"> Although single-document summarization is a well-studied task (see Mani and Maybury, 1999 for an overview), multi-document summarization is only recently being studied closely (Marcu & Gerber 2001).</Paragraph> <Paragraph position="4"> While close attention has been paid to multi-document summarization technologies (Barzilay et al. 2002, Goldstein et al 2000), the inherent properties of human-written multi-document summaries have not yet been quantified. In this paper, we seek to empirically characterize ideal multi-document summaries in part by attempting to answer the questions: Can multi-document summaries that are written by humans be characterized as extractive or generative? Are multi-document summaries less extractive than single-document summaries? Our aim in answering these questions is to discover how the nature of multi-document summaries will impact our system requirements.</Paragraph> <Paragraph position="5"> We have chosen to focus our experiments on the data provided for summarization evaluation during the Document Understanding Conference (DUC). While we recognize that other summarization corpora may exhibit different properties than what we report, the data prepared for DUC evaluations is widely used, and continues to be a powerful force in shaping directions in summarization research and evaluation.</Paragraph> <Paragraph position="6"> In the following section we describe previous work related to investigating the potential for extractive summaries. Section 3 describes a new approach for assessing the degree to which a summary can be described as extractive, and reports our findings for both single and multiple document summarization tasks. We conclude with a discussion of our findings in Section 4.</Paragraph> </Section> class="xml-element"></Paper>