File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/p99-1071_intro.xml
Size: 5,673 bytes
Last Modified: 2025-10-06 14:06:55
<?xml version="1.0" standalone="yes"?> <Paper uid="P99-1071"> <Title>Information Fusion in the Context of Multi-Document Summarization</Title> <Section position="2" start_page="0" end_page="550" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Information overload has created an acute need for summarization. Typically, the same information is described by many different online documents. Hence, summaries that synthesize common information across documents and emphasize the differences would significantly help readers. Such a summary would be beneficial, for example, to a user who follows a single event through several newswires. In this paper, we present research on the automatic fusion of similar information across multiple documents using language generation to produce a concise summary. null We propose a method for summarizing a specific type of input: news articles presenting different descriptions of the same event. Hundreds of news stories on the same event are produced daily by news agencies. Repeated information about the event is a good indicator of its importancy to the event, and can be used for summary generation.</Paragraph> <Paragraph position="1"> Most research on single document summarization, particularly for domain independent tasks, uses sentence extraction to produce a summary (Lin and Hovy, 1997; Marcu, 1997; Salton et al., 1991). In the case of multi-document summarization of articles about the same event, the original articles can include both similar and contradictory information.</Paragraph> <Paragraph position="2"> Extracting all similar sentences would produce a verbose and repetitive summary, while extracting some similar sentences could produce a summary biased towards some sources.</Paragraph> <Paragraph position="3"> Instead, we move beyond sentence extraction, using a comparison of extracted similar sentences to select the phrases that should be included in the summary and sentence generation to reformulate them as new text. Our work is part of a full summarization system (McKeown et al., 1999), which extracts sets of similax sentences, themes (Eskin et al., 1999), in the first stage for input to the components described here.</Paragraph> <Paragraph position="4"> Our model for multi-document summarization represents a number of departures from traditional language generation. Typically, language generation systems have access to a full semantic representation of the domain. A content planner selects and orders propositions from an underlying knowledge base to form text content. A sentence planner determines how to combine propositions into a single sentence, and a sentence generator realizes each set of combined propositions as a sentence, mapping from concepts to words and building syntactic structure. Our approach differs in the following ways: Content planning operates over full sentences, producing sentence fragments. Thus, content planning straddles the border between interpretation and generation. We preprocess the similar sentences using an existing shallow parser (Collins, 1996) and a mapping to predicate-argument structure. The content planner finds an intersection of phrases by comparing the predicate-argument structures; through this process it selects the phrases that can adequately convey the common information of the theme. It also orders selected phrases and augments them with On 3th of September 1995, 120 hostages were released by Bosnian Serbs. Serbs were holding over 250 U.N. personnel. Bosnian serb leader Radovan Karadjic said he expected &quot;a sign of goodwill&quot; from the international community. U.S. F-16 fighter jet was shot down by Bosnian ! Serbs. Electronic beacon signals, which might have been i transmitted by a downed U.S. fighter pilot in Bosnia, were no longer being received. After six days, O'Grady, downed pilot, was rescued by Marine force. The mission was carried out by CH-53 helicopters with an escort of ing 12 news articles as input.</Paragraph> <Paragraph position="5"> information needed for clarification (entity descriptions, temporal references, and newswire source references).</Paragraph> <Paragraph position="6"> Sentence generation begins with phrases. Our task is to produce fluent sentences that combine these phrases, arranging them in novel contexts. In this process, new grammatical constraints may be imposed and paraphrasing may be required.</Paragraph> <Paragraph position="7"> We developed techniques to map predicate-argument structure produced by the content-planner to the functional representation expected by FUF/SURGE(Elhadad, 1993; Robin, 1994) and to integrate new constraints on realization choice, using surface features in place of semantic or pragmatic ones typically used in sentence generation. null An example summary automatically generated by the system from our corpus of themes is shown in Figure 1. We collected a corpus of themes, that was divided into a training portion and a testing portion. We used the training data for identification of paraphrasing rules on which our comparison algorithm is built. The system we describe has been fully implemented and tested on a variety of input articles; there are, of course, many open research issues that we are continuing to explore.</Paragraph> <Paragraph position="8"> In the following sections, we provide an overview of existing multi-document summarization systems, then we will detail our sentence comparison technique, and describe the sentence generation component. We provide examples of generated summaries and conclude with a discussion of evaluation.</Paragraph> </Section> class="xml-element"></Paper>