File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/98/j98-3005_abstr.xml

Size: 5,579 bytes

Last Modified: 2025-10-06 13:49:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="J98-3005">
  <Title>Generating Natural Language Summaries from Multiple On-Line Sources</Title>
  <Section position="2" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> One of the major problems with the Internet is the abundance of information and the resulting difficulty for a typical computer user to read all existing documents on a specific topic. Even within the domain of current news, the user's task is infeasible.</Paragraph>
    <Paragraph position="1"> There exist now more than 100 sources of live newswire on the Internet, mostly accessible through the World-Wide Web (Berners-Lee 1992). Some of the most popular sites include news agencies and television stations like Reuters News (Reuters 1996), CNN's Web (CNN 1996), and ClariNet's e.News on-line newspaper (ClariNet 1996), as well as on-line versions of print media such as the New York Times on the Web edition (NYT 1996).</Paragraph>
    <Paragraph position="2"> For the typical user, it is nearly impossible to go through megabytes of news every day to select articles he wishes to read. Even when the user can actually select all news relevant to the topic of interest, he will still be faced with the problem of selecting a small subset that he can actually read in a limited time from the immense corpus of news available. Hence, there is a need for search and selection services, as well as for summarization facilities.</Paragraph>
    <Paragraph position="3"> There currently exist more than 40 search and selection services on the World-Wide Web, such as DEC's Altavista (Altavista 1996), Lycos (Lycos 1996), and DejaNews (DejaNews 1997), all of which allow keyword searches for recent news. However, only recently have there been practical results in the area of summarization.</Paragraph>
    <Paragraph position="4"> Summaries can be used to determine if any of the retrieved articles are relevant (thereby allowing the user to avoid reading those that are not) or can be read in place of the articles to learn about information of interest to the user. Existing summarization systems (e.g., Preston and Williams 1994; Cuts 1994; NetSumm 1996; Kupiec, Pedersen, and Chen 1995; Rau, Brandow, and Mitze, 1994) typically use statistical techniques to  * Department of Computer Science, 450 Computer Science Building, Columbia University, New York, NY 10027. E-maih {radev, kathy}@cs.columbia.edu (~) 1998 Association for Computational Linguistics  Computational Linguistics Volume 24, Number 3 extract relevant sentences from a news article. This domain-independent approach produces a summary of a single article at a time, which can indicate to the user what the article is about. In contrast, our work focuses on generation of a summary that briefs the user on information in which he has indicated interest. Such briefings pull together information of interest from multiple sources, aggregating information to provide generalizations, similarities, and differences across articles, and changes in perspective across time. Briefings do not necessarily fully summarize the articles retrieved, but they update the user on information he has specified is of interest.</Paragraph>
    <Paragraph position="5"> Some characteristics that distinguish a briefing from the general concept of a summary are: Briefings are used to keep a person up to date on a certain event. Thus, they need to convey information about the event using appropriate historical references and the context of prior news.</Paragraph>
    <Paragraph position="6"> Briefings focus on certain types of information that are present in the source text in which the reader has expressed interest. They deliberately ignore facts that are tangential to the user's interests, whether or not these facts are the focus of the article. In other words, briefings are more user-centered than general summaries; the latter convey information that the writer has considered important, whereas briefings are based on information that the user is looking for.</Paragraph>
    <Paragraph position="7"> We present a system, called SUMMONS 1 (McKeown and Radev 1995; Radev 1996; Radev and McKeown 1997), shown in Figure 1, which introduces novel techniques in the following areas:  obtained from on-line sources.</Paragraph>
    <Paragraph position="8"> As can be expected from a knowledge-based summarization system, SUMMONS works in a restricted domain. We have chosen the domain of news on terrorism for several reasons. First, there is already a large body of related research projects in information extraction, knowledge representation, and text planning in the domain of terrorism. For example, earlier systems developed under the DARPA Message Understanding Conference (MUC) were in the terrorist domain, and thus, we can build on these systems without having to start from scratch. The domain is important to a variety of users, including casual news readers, journalists, and security analysts. Finally, SUMMONS is being developed as part of a general environment for illustrated briefing over live multimedia information (Aho et al. 1997). Of all MUC system domains, terrorism is more likely to have a variety of related images than other domains that were explored, such as mergers and acquisitions or management succession.</Paragraph>
    <Paragraph position="9"> In order to extract information of interest to the user, SUMMONS makes use of components from several MUC systems. The output of such modules is in the form of</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML