File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/99/p99-1071_relat.xml
Size: 3,216 bytes
Last Modified: 2025-10-06 14:16:10
<?xml version="1.0" standalone="yes"?> <Paper uid="P99-1071"> <Title>Information Fusion in the Context of Multi-Document Summarization</Title> <Section position="3" start_page="550" end_page="551" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> Automatic summarizers typically identify and extract the most important sentences from an input article. A variety of approaches exist for determining the salient sentences in the text: statistical techniques based on word distribution (Salton et al., 1991), symbolic techniques based on discourse structure (Marcu, 1997), and semantic relations between words (Barzilay and Elhadad, 1997). Extraction techniques can work only if summary sentences already appear in the article. Extraction cannot handle the task we address, because summarization of multiple documents requires information about similarities and differences across articles.</Paragraph> <Paragraph position="1"> While most of the summarization work has focused on single articles, a few initial projects have started to study multi-document summarization documents. In constrained domains, e.g., terrorism, a coherent summary of several articles can be generated, when a detailed semantic representation of the source text is available. For example, information extraction systems can be used to interpret the source text. In this framework, (Raclev and McKeown, 1998) use generation techniques to highlight changes over time across input articles about the same event. In an arbitrary domain, statistical techniques are used to identify similarities and differences across documents. Some approaches directly exploit word distribution in the text (Salton et al., 1991; Carbonell and Goldstein, 1998). Recent work (Mani and Bloedorn, 1997) exploits semantic relations between text units for content representation, such as synonymy and co-reference. A spreading activation algorithm and graph matching is used to identify similarities and differences across documents. The output is presented as a set of paragraphs with similar and unique words highlighted. However, if the same information is mentioned several times in different documents, much of the summary will be redundant. While some researchers address this problem by selecting a subset of the repetitions (Carbonell and Goldstein, 1998), this approach is not always satisfactory. As we will see in the next section~ we can both eliminate redundancy from the output and retain balance through the selection of common information.</Paragraph> <Paragraph position="2"> On Friday, a U.S. F-16 fighter jet was shot down by Bosnian Serb missile while policing the no-fly zone over the region.</Paragraph> <Paragraph position="3"> A Bosnian Serb missile shot down a U.S. F-16 over northern Bosnia on Friday.</Paragraph> <Paragraph position="4"> On the eve of the meeting, a U.S. F-16 fighter was shot down while on a routine patrol over northern Bosnia. O'Grady's F-16 fighter jet, based in Aviano, Italy, was shot down by a Bosnian Serb SA-6 anti-aircraft missile last Friday and hopes had diminished for finding him alive despite intermittent electronic signals from the area</Paragraph> </Section> class="xml-element"></Paper>