File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/p05-3013_evalu.xml
Size: 3,728 bytes
Last Modified: 2025-10-06 13:59:27
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-3013"> <Title>Language Independent Extractive Summarization</Title> <Section position="5" start_page="50" end_page="51" type="evalu"> <SectionTitle> 3 Evaluation </SectionTitle> <Paragraph position="0"> English document summarization experiments are run using the summarization test collection provided in the framework of the Document Understanding Conference (DUC). In particular, we use the data set of 567 news articles made available during the DUC 2002 evaluations (DUC, 2002), and the corresponding 100-word summaries generated for each of these documents. This is the single document summarization task undertaken by other systems participating in sample text. Scores reflecting sentence importance are shown in brackets next to each sentence.</Paragraph> <Paragraph position="1"> the DUC 2002 document summarization evaluations.</Paragraph> <Paragraph position="2"> To test the language independence aspect of the algorithm, in addition to the English test collection, we also use a Brazilian Portuguese data set consisting of 100 news articles and their corresponding manually produced summaries. We use the TeM'ario test collection (Pardo and Rino, 2003), containing newspaper articles from online Brazilian newswire: 40 documents from Jornal de Brasil and 60 documents from Folha de S~ao Paulo. The documents were selected to cover a variety of domains (e.g. world, politics, foreign affairs, editorials), and manual summaries were produced by an expert in Brazilian Portuguese. Unlike the summaries produced for the English DUC documents - which had a length requirement of approximately 100 words, the length of the summaries in the TeM'ario data set is constrained relative to the length of the corresponding documents, i.e. a summary has to account for about 25-30% of the original document.</Paragraph> <Paragraph position="3"> Consequently, the automatic summaries generated for the documents in this collection are not restricted to 100 words, as in the English experiments, but are required to have a length comparable to the corresponding manual summaries, to ensure a fair evaluation. For evaluation, we are using the ROUGE evaluation toolkit1, which is a method based on Ngram statistics, found to be highly correlated with human evaluations (Lin and Hovy, 2003). The evaluation is done using the Ngram(1,1) setting of ROUGE, which was found to have the highest correlation with human judgments, at a confidence level of 95%.</Paragraph> <Paragraph position="4"> Table 2 shows the results obtained on these two data sets for different graph settings. The table also lists baseline results, obtained on summaries generated by taking the first sentences in each document. By ways of comparison, the best participating system in DUC 2002 was a supervised system that led to a ROUGE score of 0.5011.</Paragraph> <Paragraph position="5"> For both data sets, TextRank applied on a directed backward graph structure exceeds the performance achieved through a simple (but powerful) baseline.</Paragraph> <Paragraph position="6"> These results prove that graph-based ranking algorithms, previously found successful in Web link analysis and social networks, can be turned into a state-of-the-art tool for extractive summarization when applied to graphs extracted from texts. Moreover, due to its unsupervised nature, the algorithm was also shown to be language independent, leading to similar results and similar improvements over baseline techniques when applied on documents in different languages. More extensive experimental results with the TextRank system are reported in (Mihalcea and Tarau, 2004), (Mihalcea, 2004).</Paragraph> </Section> class="xml-element"></Paper>