File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1017_intro.xml

Size: 3,253 bytes

Last Modified: 2025-10-06 14:02:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1017">
  <Title>Event-Based Extractive Summarization</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The main goal of extractive summarization can be concisely formulated as extracting from the input pieces of text which contain the information about the most important concepts mentioned in the input text or texts. This definition conceals a lot of important issues that should be taken into consideration in the process of summary construction. First, it is necessary to identify the important concepts which should be described in the summary. When those important concepts are identified then the process of summarization can be presented as:  1. Break the input text into textual units (sentences, paragraphs, etc.).</Paragraph>
    <Paragraph position="1"> 2. See what concepts each textual unit covers.</Paragraph>
    <Paragraph position="2"> 3. Choose a particular textual unit for the output according to the concepts present in all textual units.</Paragraph>
    <Paragraph position="3"> 4. Continue choosing textual units until reaching  the desired length of the summary.</Paragraph>
    <Paragraph position="4"> Some current summarization systems add a clustering step, substituting the analysis of all the textual units by the analysis of representative units from each cluster. Clustering is helpful for avoiding repetitions in the summary.</Paragraph>
    <Paragraph position="5"> In this paper we propose a new representation for concepts and correspondingly a new feature on which summarization can be based. We adapt the algorithm we proposed earlier (Filatova and Hatzivassiloglou, 2003) for assigning to each sentence a list of low-level, atomic events. These events capture information about important named entities for the input text or texts, and the relationships between these named entities. We also discuss a general model which treats summarization as a threecomponent problem, involving the identification of the textual units into which the input text should be broken and which are later used as the constituent parts of the final summary, the textual features which are associated with the important concepts described in the input text, and the appropriate algorithm for selecting the textual units to be included into the summary.</Paragraph>
    <Paragraph position="6"> We focus on the latter two of those steps and explore interdependencies between the choice of features (step 2) and selection algorithm (step 3). We experimentally test our hypothesis that event-based features are helpful for summarization by comparing the performance of three sentence selection algorithms when we use such features versus the case where we use another, widely used set of textual features: the words in the input texts, weighted by their tf*idf scores. The results establish that for the majority of document sets in our test collection, events outperform tf*idf for all algorithms considered. Furthermore, we show that this benefit is more pronounced when the selection algorithm includes steps to address potential repetition of information in the output summary.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML