File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-0405_intro.xml
Size: 3,024 bytes
Last Modified: 2025-10-06 14:00:53
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-0405"> <Title>Multi-Document Summarization By Sentence Extraction</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> With the continuing growth of online information, it has become increasingly important to provide improved mechanisms to find and present textual information effectively. Conventional IR systems find and rank documents based on maximizing relevance to the user query (Salton, 1970; van Rijsbergen, 1979; Buckley, 1985; Salton, 1989). Some systems also include sub-document relevance assessments and convey this information to the user. More recently, single document summarization systems provide an automated generic abstract or a query-relevant summary (TIPSTER, 1998a). i However, large-scale IR and summarization have not yet been truly integrated, and the functionality challenges on a summarization system are greater in a true IR or topic-detection context (Yang et al., 1998; Allan et al., 1998).</Paragraph> <Paragraph position="1"> Consider the situation where the user issues a search query, for instance on a news topic, and the retrieval system finds hundreds of closely-ranked documents in response. Many of these documents are likely to repeat much the same information, while differing in certain i Most of these were based on statistical techniques applied to various document entities; examples include frait, 1983; Kupiec et al., 1995; Paice, 1990, Klavans and Shaw, 1995; MeKeown et al., 1995; Shaw, 1995; AonC/ et al., 1997; Boguraev and Kennedy, 1997; Hovy and Lin, 1997; Mitra et al., 1997; Teufel and Moens, 1997; Barzilay and Elhadad, 1997; Carbonell and Goldstein, 1998; Baldwin and Mortbn, 1998; Radev and McKeown, 1998; Strzalkowski et al., 1998). parts. Summaries of the individual documents would help, but are likely to be very similar to each other, unless the summarization system takes into account other summaries that have already been generated. Multi-document summarization - capable of summarizing either complete documents sets, or single documents in the context of previously summarized ones - are likely to be essential in such situations. Ideally, multi-document summaries should contain the key shared relevant information among all the documents only once, plus other information unique to some of the individual documents that are directly relevant to the user's query.</Paragraph> <Paragraph position="2"> Though many of the same techniques used in single-document summarization can also be used in multi-document summarization, there are at least four significant differences: 1. The degree of redundancy in information contained within a group of topically-related articles is much higher than the degree of redundancy within an article, as each article is apt to describe the main point as well as necessary shared background. Hence anti-redundancy methods are more crucial.</Paragraph> </Section> class="xml-element"></Paper>