File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/i05-2004_metho.xml
Size: 20,041 bytes
Last Modified: 2025-10-06 14:09:34
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-2004"> <Title>A Language Independent Algorithm for Single and Multiple Document Summarization</Title> <Section position="3" start_page="0" end_page="20" type="metho"> <SectionTitle> 2 Iterative Graph-based Algorithms for </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="19" type="sub_section"> <SectionTitle> Extractive Summarization </SectionTitle> <Paragraph position="0"> In this section, we shortly describe two graph-based ranking algorithms and their application to the task of extractive summarization. Ranking algorithms, such as Kleinberg's HITS algorithm (Kleinberg, 1999) or Google's PageRank (Brin and Page, 1998), have been traditionally and successfully used in Web-link analysis (Brin and Page, 1998), social networks, and more recently in text processing applications (Mihalcea and Tarau, 2004), (Mihalcea et al., 2004), (Erkan and Radev, 2004). In short, a graph-based ranking algorithm is a way of deciding on the importance of a vertex within a graph, by taking into account global information recursively computed from the entire graph, rather than relying only on local vertex-specific information. The basic idea implemented by the ranking model is that of &quot;voting&quot; or &quot;recommendation&quot;. When one vertex links to another one, it is basically casting a vote for that other vertex. The higher the number of votes that are cast for a vertex, the higher the importance of the vertex.</Paragraph> <Paragraph position="1"> Let G = (V;E) be a directed graph with the set of vertices V and set of edges E, where E is a sub-set of V PSV . For a given vertex Vi, let In(Vi) be the set of vertices that point to it (predecessors), and let Out(Vi) be the set of vertices that vertex Vi points to (successors).</Paragraph> <Paragraph position="2"> PageRank. PageRank (Brin and Page, 1998) is perhaps one of the most popular ranking algorithms, and was designed as a method for Web link analysis. Unlike other graph ranking algorithms, PageRank integrates the impact of both incoming and outgoing links into one single model, and therefore it produces only one set of scores:</Paragraph> <Paragraph position="4"> where d is a parameter set between 0 and 1.</Paragraph> <Paragraph position="5"> HITS. HITS (Hyperlinked Induced Topic Search) (Kleinberg, 1999) is an iterative algorithm that was designed for ranking Web pages according to their degree of &quot;authority&quot;. The HITS algorithm makes a distinction between &quot;authorities&quot; (pages with a large number of incoming links) and &quot;hubs&quot; (pages with a large number of outgoing links). For each vertex, HITS produces two sets of scores - an &quot;authority&quot; score, and a &quot;hub&quot; score:</Paragraph> <Paragraph position="7"> For each of these algorithms, starting from arbitrary values assigned to each node in the graph, the computation iterates until convergence below a given threshold is achieved. After running the algorithm, a score is associated with each vertex, which represents the &quot;importance&quot; or &quot;power&quot; of that vertex within the graph.</Paragraph> <Paragraph position="8"> In the context of Web surfing or citation analysis, it is unusual for a vertex to include multiple or partial links to another vertex, and hence the original definition for graph-based ranking algorithms is assuming unweighted graphs. However, when the graphs are built starting with natural language texts, they may include multiple or partial links between the units (vertices) that are extracted from text. It may be therefore useful to integrate into the model the &quot;strength&quot; of the connection between two vertices Vi and Vj as a weight wij added to the corresponding edge that connects the two vertices. The ranking algorithms are thus adapted to include edge weights, e.g. for PageRank the score is determined using the following formula (a similar change can be applied to the HITS algorithm):</Paragraph> <Paragraph position="10"> [1] Watching the new movie, &quot;Imagine: John Lennon,&quot; was very painful for the late Beatle's wife, Yoko Ono.</Paragraph> <Paragraph position="11"> [2] &quot;The only reason why I did watch it to the end is because I'm responsible for it, even though somebody else made it,&quot; she said.</Paragraph> <Paragraph position="12"> [3] Cassettes, film footage and other elements of the acclaimed movie were collected by Ono.</Paragraph> <Paragraph position="13"> [4] She also took cassettes of interviews by Lennon, which were edited in such a way that he narrates the picture.</Paragraph> <Paragraph position="14"> [5] Andrew Solt (&quot;This Is Elvis&quot;) directed, Solt and David L. Wolper produced and Solt and Sam Egan wrote it.</Paragraph> <Paragraph position="15"> [6] &quot;I think this is really the definitive documentary of John sample text. Scores reflecting sentence importance are shown in brackets next to each sentence.</Paragraph> <Paragraph position="16"> While the final vertex scores (and therefore rankings) for weighted graphs differ significantly as compared to their unweighted alternatives, the number of iterations to convergence and the shape of the convergence curves is almost identical for weighted and unweighted graphs.</Paragraph> </Section> <Section position="2" start_page="19" end_page="20" type="sub_section"> <SectionTitle> 2.1 Single Document Summarization </SectionTitle> <Paragraph position="0"> For the task of single-document extractive summarization, the goal is to rank the sentences in a given text with respect to their importance for the overall understanding of the text. A graph is therefore constructed by adding a vertex for each sentence in the text, and edges between vertices are established using sentence inter-connections. These connections are defined using a similarity relation, where &quot;similarity&quot; is measured as a function of content overlap. Such a relation between two sentences can be seen as a process of &quot;recommendation&quot;: a sentence that addresses certain concepts in a text gives the reader a &quot;recommendation&quot; to refer to other sentences in the text that address the same concepts, and therefore a link can be drawn between any two such sentences that share common content.</Paragraph> <Paragraph position="1"> The overlap of two sentences can be determined simply as the number of common tokens between the lexical representations of two sentences, or it can be run through syntactic filters, which only count words of a certain syntactic category. Moreover, to avoid promoting long sentences, we use a normalization factor, and divide the content overlap of two sentences with the length of each sentence.</Paragraph> <Paragraph position="2"> The resulting graph is highly connected, with a weight associated with each edge, indicating the strength of the connections between various sentence pairs in the text. The graph can be represented as: (a) simple undirected graph; (b) directed weighted graph with the orientation of edges set from a sentence to sentences that follow in the text (directed forward); or (c) directed weighted graph with the orientation of edges set from a sentence to previous sentences in the text (directed backward).</Paragraph> <Paragraph position="3"> After the ranking algorithm is run on the graph, sentences are sorted in reversed order of their score, and the top ranked sentences are selected for inclusion in the extractive summary. Figure 1 shows an example of a weighted graph built for a sample text of six sentences.</Paragraph> </Section> <Section position="3" start_page="20" end_page="20" type="sub_section"> <SectionTitle> 2.2 Multiple Document Summarization </SectionTitle> <Paragraph position="0"> Multi-document summaries are built using a &quot;meta&quot; summarization procedure. First, for each document in a given cluster of documents, a single document summary is generated using one of the graph-based ranking algorithms. Next, a &quot;summary of summaries&quot; is produced using the same or a different ranking algorithm. Figure 2 illustrates the metasummarization process used to generate a multi-document summary starting with a cluster of N documents.</Paragraph> <Paragraph position="1"> Unlike single documents - where sentences with highly similar content are very rarely if at all encountered - it is often the case that clusters of multiple documents, all addressing the same or related topics, would contain very similar or even identical sentences. To avoid such pairs of sentences, which may decrease the readability and the amount of information conveyed by a summary, we introduce a maximum threshold on the sentence similarity measure. Consequently, in the graph construction stage, no link (edge) is added between sentences (vertices) whose similarity exceeds this threshold. In</Paragraph> </Section> </Section> <Section position="4" start_page="20" end_page="21" type="metho"> <SectionTitle> 3 Materials and Evaluation Methodology </SectionTitle> <Paragraph position="0"> Single and multiple English document summarization experiments are run using the summarization test collection provided in the framework of the Document Understanding Conference (DUC). In particular, we use the data set of 567 news articles made available during the DUC 2002 evaluations (DUC, 2002), and the corresponding 100-word summaries generated for each of these documents (single-document summarization), or the 100-word summaries generated for each of the 59 document clusters formed on the same data set (multi-document summarization). These are the summarization tasks undertaken by other systems participating in the DUC 2002 document summarization evaluations.</Paragraph> <Paragraph position="1"> To test the language independence aspect of the algorithm, in addition to the English test collection, we also use a Brazilian Portuguese data set consisting of 100 news articles and their corresponding manually produced summaries. We use the TeM'ario test collection (Pardo and Rino, 2003), containing newspaper articles from online Brazilian newswire: 40 documents from Jornal de Brasil and 60 documents from Folha de S~ao Paulo. The documents were selected to cover a variety of domains (e.g. world, politics, foreign affairs, editorials), and manual summaries were produced by an expert in Brazilian Portuguese. Unlike the summaries produced for the English DUC documents - which had a length requirement of approximately 100 words, the length of the summaries in the TeM'ario data set is constrained relative to the length of the corre- null sponding documents, i.e. a summary has to account for about 25-30% of the original document. Consequently, the automatic summaries generated for the documents in this collection are not restricted to 100 words, as in the English experiments, but are required to have a length comparable to the corresponding manual summaries, to ensure a fair evaluation. null For evaluation, we are using the ROUGE evaluation toolkit1, which is a method based on Ngram statistics, found to be highly correlated with human evaluations (Lin and Hovy, 2003a). The evaluation is done using the Ngram(1,1) setting of ROUGE, which was found to have the highest correlation with human judgments, at a confidence level of 95%.</Paragraph> </Section> <Section position="5" start_page="21" end_page="22" type="metho"> <SectionTitle> 4 Experimental Results </SectionTitle> <Paragraph position="0"> The extractive summarization algorithm is evaluated in the context of: (1) A single-document summarization task, where a summary is generated for each of the 567 English news articles provided during the Document Understanding Evaluations 2002 (DUC, 2002), and for each of the 100 Portuguese documents in the TeM'ario data set; and (2) A multi-document summarization task, where a summary is generated for each of the 59 document clusters in the DUC 2002 data. Since document clusters and multi-document summaries are not available for the Portuguese documents, a multi-document summarization evaluation could not be conducted on this data set. Note however that the multi-document summarization tool is based on the single-document summarization method (see Figure 2), and thus high performance in single-document summarization is expected to result into a similar level of performance in multi-document summarization.</Paragraph> <Section position="1" start_page="21" end_page="21" type="sub_section"> <SectionTitle> 4.1 Single Document Summarization for English </SectionTitle> <Paragraph position="0"> For single-document summarization, we evaluate the extractive summaries produced using each of the two graph-based ranking algorithms described in Section 2 (HITS and PageRank). Table 1 shows the results obtained for the 100-words automatically generated summaries for the English DUC 2002 data set. The table shows results using the two graph algorithms described in Section 2 when using graphs that are: (a) undirected, (b) directed forward, or (c) directed backward2.</Paragraph> <Paragraph position="1"> For a comparative evaluation, Table 2 shows the results obtained on this data set by the top 5 (out dant, since the &quot;hub&quot; variation of the HITS algorithm can be derived from its &quot;authority&quot; counterpart by reversing the edge orientation in the graphs.</Paragraph> <Paragraph position="2"> gle document summarization task at DUC 2002. It also lists the baseline performance, computed for 100-word summaries generated by taking the first sentences in each article.</Paragraph> <Paragraph position="3"> Top 5 systems (DUC, 2002)</Paragraph> </Section> <Section position="2" start_page="21" end_page="21" type="sub_section"> <SectionTitle> 4.2 Single Document Summarization for Portuguese </SectionTitle> <Paragraph position="0"> The single-document summarization tool was also evaluated on the TeM'ario collection of Portuguese newspaper articles. We used the same graph settings as in the English experiments: graph-based ranking algorithms consisting of either HITS or PageRank, relying on graphs that are undirected, directed forward, or directed backward. As mentioned in Section 3, the length of each automatically generated summary was constrained to match the length of the corresponding manual summary, for a fair comparison. Table 3 shows the results obtained on this data set, evaluated using the ROUGE evaluation toolkit. A baseline was also computed, using the first sentences in each document, and evaluated</Paragraph> </Section> <Section position="3" start_page="21" end_page="22" type="sub_section"> <SectionTitle> 4.3 Multiple Document Summarization </SectionTitle> <Paragraph position="0"> We evaluate multi-document summaries generated using combinations of the graph-based ranking algorithms that were found to work best in the single document summarization experiments PageRankW and HITSWA , on undirected or directed backward graphs. Although the single document summaries used in the &quot;meta&quot; summarization process may conceivably be of any size, in this evaluation their length is limited to 100 words.</Paragraph> <Paragraph position="1"> As mentioned earlier, different graph algorithms can be used for producing the single document summary and the &quot;meta&quot; summary; Table 4 lists the results for multi-document summarization experiments using various combinations of graph algorithms. For comparison, Table 5 lists the results obtained by the top 5 (out of 9) performing systems in the multi-document summarization task at DUC 2002, and a baseline generated by taking the first sentence in each article.</Paragraph> <Paragraph position="2"> Since no multi-document clusters and associated summaries were available for the other language considered in our experiments, the multi-document summarization experiments were conducted only on the English data set. However, since the multi-doc summarization technique consists of a layered application of single-document summarization, we believe that the performance achieved in single-document summarization for Portuguese would eventually result into similar performance figures when applied to the summarization of clusters of documents.</Paragraph> <Paragraph position="3"> Top 5 systems (DUC, 2002)</Paragraph> </Section> <Section position="4" start_page="22" end_page="22" type="sub_section"> <SectionTitle> 4.4 Discussion </SectionTitle> <Paragraph position="0"> The graph-based extractive summarization algorithm succeeds in identifying the most important sentences in a text (or collection of texts) based on information exclusively drawn from the text itself.</Paragraph> <Paragraph position="1"> Unlike other supervised systems, which attempt to learn what makes a good summary by training on collections of summaries built for other articles, the graph-based method is fully unsupervised, and relies only on the given texts to derive an extractive summary.</Paragraph> <Paragraph position="2"> For single document summarization, the HITSWA and PageRankW algorithms, run on a graph structure encoding a backward direction across sentence relations, provide the best performance. These results are consistent across languages - with similar performance figures observed on both the English DUC data set and on the Portuguese TeM'ario data set. The setting that is always exceeding the baseline by a large margin is PageRankW on a directed backward graph, with clear improvements over the simple (but powerful) first-sentence selection baseline.</Paragraph> <Paragraph position="3"> Moreover, comparative evaluations performed with respect to other systems participating in the DUC 2002 evaluations revealed the fact that the performance of the graph-based extractive summarization method is competitive with state-of-the-art summarization systems.</Paragraph> <Paragraph position="4"> Interestingly, the &quot;directed forward&quot; setting is consistently performing worse than the baseline, which can be explained by the fact that both data sets consist of newspaper articles, which tend to concentrate the most important facts toward the beginning of the document, and therefore disfavor a forward direction set across sentence relations.</Paragraph> <Paragraph position="5"> For multiple document summarization, the best &quot;meta&quot; summarizer is the PageRankW algorithm applied on undirected graphs, in combination with a single summarization system using the HITSWA ranking algorithm, for a performance similar to the one of the best system in the DUC 2002 multi-document summarization task.</Paragraph> <Paragraph position="6"> The results obtained during all these experiments prove that graph-based ranking algorithms, previously found successful in Web link analysis and social networks, can be turned into a state-of-the-art tool for extractive summarization when applied to graphs extracted from texts. Moreover, the method was also shown to be language independent, leading to similar results when applied to the summarization of documents in different languages.</Paragraph> <Paragraph position="7"> The better results obtained by algorithms like HITSWA and PageRank on graphs containing only backward edges are likely to come from the fact that recommendations flowing toward the beginning of the text take advantage of the bias giving higher summarizing value of sentences occurring at the beginning of the document.</Paragraph> <Paragraph position="8"> Another important aspect of the method is that it gives a ranking over all sentences in a text (or a collection of texts) - which means that it can be easily adapted to extracting very short summaries, or longer more explicative summaries.</Paragraph> </Section> </Section> class="xml-element"></Paper>