A Language Independent Algorithm for
Single and Multiple Document Summarization
Rada Mihalcea and Paul Tarau
Department of Computer Science and Engineering
University of North Texas
frada,taraug@cs.unt.edu
Abstract
This paper describes a method for lan-
guage independent extractive summariza-
tion that relies on iterative graph-based
ranking algorithms. Through evalua-
tions performed on a single-document
summarization task for English and Por-
tuguese, we show that the method per-
forms equally well regardless of the lan-
guage. Moreover, we show how a meta-
summarizer relying on a layered appli-
cation of techniques for single-document
summarization can be turned into an ef-
fective method for multi-document sum-
marization.
1 Introduction
Algorithms for extractive summarization are typi-
cally based on techniques for sentence extraction,
and attempt to identify the set of sentences that are
most important for the overall understanding of a
given document. Some of the most successful ap-
proaches consist of supervised algorithms that at-
tempt to learn what makes a good summary by
training on collections of summaries built for a rela-
tively large number of training documents, e.g. (Hi-
rao et al., 2002), (Teufel and Moens, 1997). How-
ever, the price paid for the high performance of
such supervised algorithms is their inability to eas-
ily adapt to new languages or domains, as new train-
ing data are required for each new data type. In
this paper, we show that a method for extractive
summarization relying on iterative graph-based al-
gorithms, as previously proposed in (Mihalcea and
Tarau, 2004) can be applied to the summarization
of documents in different languages without any re-
quirements for additional data. Additionally, we
also show that a layered application of this single-
document summarization method can result into an
efficient multi-document summarization tool.
Earlier experiments with graph-based ranking al-
gorithms for text summarization, as previously re-
ported in (Mihalcea and Tarau, 2004) and (Erkan
and Radev, 2004), were either limited to single-
document English summarization, or they were ap-
plied to English multi-document summarization,
but in conjunction with other extractive summariza-
tion techniques that did not allow for a clear evalua-
tion of the impact of the graph algorithms alone. In
this paper, we show that a method exclusively based
on graph-based algorithms can be successfully ap-
plied to the summarization of single and multiple
documents in any language, and show that the re-
sults are competitive with those of state-of-the-art
summarization systems.
The paper is organized as follows. Section 2
briefly overviews two iterative graph-based ranking
algorithms, and shows how these algorithms can be
applied to single and multiple document summa-
rization. Section 3 describes the data sets used in
the summarization experiments and the evaluation
methodology. Experimental results are presented in
Section 4, followed by discussions, pointers to re-
lated work, and conclusions.
2 Iterative Graph-based Algorithms for
Extractive Summarization
In this section, we shortly describe two graph-based
ranking algorithms and their application to the task
of extractive summarization. Ranking algorithms,
such as Kleinberg’s HITS algorithm (Kleinberg,
1999) or Google’s PageRank (Brin and Page, 1998),
have been traditionally and successfully used in
Web-link analysis (Brin and Page, 1998), social net-
works, and more recently in text processing appli-
cations (Mihalcea and Tarau, 2004), (Mihalcea et
al., 2004), (Erkan and Radev, 2004). In short, a
graph-based ranking algorithm is a way of decid-
ing on the importance of a vertex within a graph, by
taking into account global information recursively
computed from the entire graph, rather than relying
19
only on local vertex-specific information. The ba-
sic idea implemented by the ranking model is that
of “voting” or “recommendation”. When one vertex
links to another one, it is basically casting a vote for
that other vertex. The higher the number of votes
that are cast for a vertex, the higher the importance
of the vertex.
Let G = (V;E) be a directed graph with the set
of vertices V and set of edges E, where E is a sub-
set of V £V . For a given vertex Vi, let In(Vi) be the
set of vertices that point to it (predecessors), and let
Out(Vi) be the set of vertices that vertex Vi points
to (successors).
PageRank. PageRank (Brin and Page, 1998)
is perhaps one of the most popular ranking algo-
rithms, and was designed as a method for Web link
analysis. Unlike other graph ranking algorithms,
PageRank integrates the impact of both incoming
and outgoing links into one single model, and there-
fore it produces only one set of scores:
PR(Vi) = (1¡d)+d⁄ X
Vj2In(Vi)
PR(Vj)
jOut(Vj)j (1)
where d is a parameter set between 0 and 1.
HITS. HITS (Hyperlinked Induced Topic
Search) (Kleinberg, 1999) is an iterative algorithm
that was designed for ranking Web pages according
to their degree of “authority”. The HITS algo-
rithm makes a distinction between “authorities”
(pages with a large number of incoming links) and
“hubs” (pages with a large number of outgoing
links). For each vertex, HITS produces two sets
of scores – an “authority” score, and a “hub” score:
HITSA(Vi) = X
Vj2In(Vi)
HITSH(Vj) (2)
HITSH(Vi) = X
Vj2Out(Vi)
HITSA(Vj) (3)
For each of these algorithms, starting from arbitrary
values assigned to each node in the graph, the com-
putation iterates until convergence below a given
threshold is achieved. After running the algorithm,
a score is associated with each vertex, which rep-
resents the “importance” or “power” of that vertex
within the graph.
In the context of Web surfing or citation analy-
sis, it is unusual for a vertex to include multiple or
partial links to another vertex, and hence the orig-
inal definition for graph-based ranking algorithms
is assuming unweighted graphs. However, when
the graphs are built starting with natural language
texts, they may include multiple or partial links be-
tween the units (vertices) that are extracted from
text. It may be therefore useful to integrate into
the model the “strength” of the connection between
two vertices Vi and Vj as a weight wij added to
the corresponding edge that connects the two ver-
tices. The ranking algorithms are thus adapted to
include edge weights, e.g. for PageRank the score
is determined using the following formula (a similar
change can be applied to the HITS algorithm):
PRW(Vi) = (1¡d)+d⁄ X
Vj2In(Vi)
wji PR
W(Vj)
P
Vk2Out(Vj)
wkj
(4)
[1] Watching the new movie, “Imagine: John Lennon,” was
very painful for the late Beatle’s wife, Yoko Ono.
[2] “The only reason why I did watch it to the end is because
I’m responsible for it, even though somebody else made it,”
she said.
[3] Cassettes, film footage and other elements of the acclaimed
movie were collected by Ono.
[4] She also took cassettes of interviews by Lennon, which
were edited in such a way that he narrates the picture.
[5] Andrew Solt (“This Is Elvis”) directed, Solt and David L.
Wolper produced and Solt and Sam Egan wrote it.
[6] “I think this is really the definitive documentary of John
Lennon’s life,” Ono said in an interview.
1
2
3
4
5
6 0.16
0.30
0.46 0.15
[1.34]
[1.75]
[0.70]
[0.74]
[0.52]
[0.91]
0.15
0.29
0.32
0.15
Figure 1: Graph of sentence similarities built on a
sample text. Scores reflecting sentence importance
are shown in brackets next to each sentence.
While the final vertex scores (and therefore rank-
ings) for weighted graphs differ significantly as
compared to their unweighted alternatives, the num-
ber of iterations to convergence and the shape of the
convergence curves is almost identical for weighted
and unweighted graphs.
2.1 Single Document Summarization
For the task of single-document extractive summa-
rization, the goal is to rank the sentences in a given
text with respect to their importance for the overall
understanding of the text. A graph is therefore con-
structed by adding a vertex for each sentence in the
text, and edges between vertices are established us-
ing sentence inter-connections. These connections
20
are defined using a similarity relation, where “simi-
larity” is measured as a function of content overlap.
Such a relation between two sentences can be seen
as a process of “recommendation”: a sentence that
addresses certain concepts in a text gives the reader
a “recommendation” to refer to other sentences in
the text that address the same concepts, and there-
fore a link can be drawn between any two such sen-
tences that share common content.
The overlap of two sentences can be determined
simply as the number of common tokens between
the lexical representations of two sentences, or it
can be run through syntactic filters, which only
count words of a certain syntactic category. More-
over, to avoid promoting long sentences, we use a
normalization factor, and divide the content overlap
of two sentences with the length of each sentence.
The resulting graph is highly connected, with a
weight associated with each edge, indicating the
strength of the connections between various sen-
tence pairs in the text. The graph can be repre-
sented as: (a) simple undirected graph; (b) directed
weighted graph with the orientation of edges set
from a sentence to sentences that follow in the text
(directed forward); or (c) directed weighted graph
with the orientation of edges set from a sentence to
previous sentences in the text (directed backward).
After the ranking algorithm is run on the graph,
sentences are sorted in reversed order of their score,
and the top ranked sentences are selected for inclu-
sion in the extractive summary. Figure 1 shows an
example of a weighted graph built for a sample text
of six sentences.
2.2 Multiple Document Summarization
Multi-document summaries are built using a “meta”
summarization procedure. First, for each document
in a given cluster of documents, a single document
summary is generated using one of the graph-based
ranking algorithms. Next, a “summary of sum-
maries” is produced using the same or a different
ranking algorithm. Figure 2 illustrates the meta-
summarization process used to generate a multi-
document summary starting with a cluster of N
documents.
Unlike single documents – where sentences with
highly similar content are very rarely if at all en-
countered – it is often the case that clusters of mul-
tiple documents, all addressing the same or related
topics, would contain very similar or even identical
sentences. To avoid such pairs of sentences, which
may decrease the readability and the amount of in-
formation conveyed by a summary, we introduce a
maximum threshold on the sentence similarity mea-
sure. Consequently, in the graph construction stage,
no link (edge) is added between sentences (ver-
tices) whose similarity exceeds this threshold. In
Single−document summarization Single−document summarization
Summary Document 1
Summary Document 2
Summary Document N
......
Single−document summarization
Single−document summarization
Document 1 Document 2 Document N
Meta−document
Multi−document summary
Figure 2: Generation of a multi-document summary
using meta-summarization.
the experiments reported in this paper, this similar-
ity threshold was empirically set to 0.5.
3 Materials and Evaluation Methodology
Single and multiple English document summariza-
tion experiments are run using the summarization
test collection provided in the framework of the
Document Understanding Conference (DUC). In
particular, we use the data set of 567 news arti-
cles made available during the DUC 2002 evalu-
ations (DUC, 2002), and the corresponding 100-
word summaries generated for each of these doc-
uments (single-document summarization), or the
100-word summaries generated for each of the 59
document clusters formed on the same data set
(multi-document summarization). These are the
summarization tasks undertaken by other systems
participating in the DUC 2002 document summa-
rization evaluations.
To test the language independence aspect of the
algorithm, in addition to the English test collection,
we also use a Brazilian Portuguese data set con-
sisting of 100 news articles and their correspond-
ing manually produced summaries. We use the
TeM´ario test collection (Pardo and Rino, 2003),
containing newspaper articles from online Brazilian
newswire: 40 documents from Jornal de Brasil and
60 documents from Folha de S˜ao Paulo. The doc-
uments were selected to cover a variety of domains
(e.g. world, politics, foreign affairs, editorials), and
manual summaries were produced by an expert in
Brazilian Portuguese. Unlike the summaries pro-
duced for the English DUC documents – which had
a length requirement of approximately 100 words,
the length of the summaries in the TeM´ario data
set is constrained relative to the length of the corre-
21
sponding documents, i.e. a summary has to account
for about 25-30% of the original document. Con-
sequently, the automatic summaries generated for
the documents in this collection are not restricted to
100 words, as in the English experiments, but are
required to have a length comparable to the corre-
sponding manual summaries, to ensure a fair evalu-
ation.
For evaluation, we are using the ROUGE evalu-
ation toolkit1, which is a method based on Ngram
statistics, found to be highly correlated with human
evaluations (Lin and Hovy, 2003a). The evaluation
is done using the Ngram(1,1) setting of ROUGE,
which was found to have the highest correlation
with human judgments, at a confidence level of
95%.
4 Experimental Results
The extractive summarization algorithm is evalu-
ated in the context of: (1) A single-document sum-
marization task, where a summary is generated for
each of the 567 English news articles provided dur-
ing the Document Understanding Evaluations 2002
(DUC, 2002), and for each of the 100 Portuguese
documents in the TeM´ario data set; and (2) A multi-
document summarization task, where a summary is
generated for each of the 59 document clusters in
the DUC 2002 data. Since document clusters and
multi-document summaries are not available for the
Portuguese documents, a multi-document summa-
rization evaluation could not be conducted on this
data set. Note however that the multi-document
summarization tool is based on the single-document
summarization method (see Figure 2), and thus high
performance in single-document summarization is
expected to result into a similar level of perfor-
mance in multi-document summarization.
4.1 Single Document Summarization for
English
For single-document summarization, we evaluate
the extractive summaries produced using each of
the two graph-based ranking algorithms described
in Section 2 (HITS and PageRank). Table 1
shows the results obtained for the 100-words au-
tomatically generated summaries for the English
DUC 2002 data set. The table shows results us-
ing the two graph algorithms described in Section
2 when using graphs that are: (a) undirected, (b)
directed forward, or (c) directed backward2.
For a comparative evaluation, Table 2 shows the
results obtained on this data set by the top 5 (out
1ROUGE is available at http://www.isi.edu/˜cyl/ROUGE/.
2Note that the first two rows in the table are in fact redun-
dant, since the “hub” variation of the HITS algorithm can be
derived from its “authority” counterpart by reversing the edge
orientation in the graphs.
Graph
Algorithm Undir. Forward Backward
HITSWA 49.12 45.84 50.23
HITSWH 49.12 50.23 45.84
PageRankW 49.04 42.02 50.08
Table 1: Results for English single-document sum-
marization.
of 15) performing systems participating in the sin-
gle document summarization task at DUC 2002. It
also lists the baseline performance, computed for
100-word summaries generated by taking the first
sentences in each article.
Top 5 systems (DUC, 2002)
S27 S31 S28 S21 S29 Baseline
50.11 49.14 48.90 48.69 46.81 47.99
Table 2: Results for top 5 DUC 2002 single docu-
ment summarization systems, and baseline.
4.2 Single Document Summarization for
Portuguese
The single-document summarization tool was also
evaluated on the TeM´ario collection of Portuguese
newspaper articles. We used the same graph set-
tings as in the English experiments: graph-based
ranking algorithms consisting of either HITS or
PageRank, relying on graphs that are undirected,
directed forward, or directed backward. As men-
tioned in Section 3, the length of each automatically
generated summary was constrained to match the
length of the corresponding manual summary, for a
fair comparison. Table 3 shows the results obtained
on this data set, evaluated using the ROUGE evalu-
ation toolkit. A baseline was also computed, using
the first sentences in each document, and evaluated
at 0.4963.
Graph
Algorithm Undir. Forward Backward
HITSWA 48.14 48.34 50.02
HITSWH 48.14 50.02 48.34
PageRankW 49.39 45.74 51.21
Table 3: Results for Portuguese single-document
summarization.
4.3 Multiple Document Summarization
We evaluate multi-document summaries gener-
ated using combinations of the graph-based rank-
ing algorithms that were found to work best in
the single document summarization experiments –
PageRankW and HITSWA , on undirected or di-
rected backward graphs. Although the single docu-
ment summaries used in the “meta” summarization
22
process may conceivably be of any size, in this eval-
uation their length is limited to 100 words.
As mentioned earlier, different graph algorithms
can be used for producing the single document sum-
mary and the “meta” summary; Table 4 lists the
results for multi-document summarization experi-
ments using various combinations of graph algo-
rithms. For comparison, Table 5 lists the results ob-
tained by the top 5 (out of 9) performing systems
in the multi-document summarization task at DUC
2002, and a baseline generated by taking the first
sentence in each article.
Since no multi-document clusters and associ-
ated summaries were available for the other lan-
guage considered in our experiments, the multi-
document summarization experiments were con-
ducted only on the English data set. However, since
the multi-doc summarization technique consists of
a layered application of single-document summa-
rization, we believe that the performance achieved
in single-document summarization for Portuguese
would eventually result into similar performance
figures when applied to the summarization of clus-
ters of documents.
Top 5 systems (DUC, 2002)
S26 S19 S29 S25 S20 Baseline
35.78 34.47 32.64 30.56 30.47 29.32
Table 5: Results for top 5 DUC 2002 multi-
document summarization systems, and baseline.
4.4 Discussion
The graph-based extractive summarization algo-
rithm succeeds in identifying the most important
sentences in a text (or collection of texts) based on
information exclusively drawn from the text itself.
Unlike other supervised systems, which attempt to
learn what makes a good summary by training on
collections of summaries built for other articles, the
graph-based method is fully unsupervised, and re-
lies only on the given texts to derive an extractive
summary.
For single document summarization, the
HITSWA and PageRankW algorithms, run on
a graph structure encoding a backward direction
across sentence relations, provide the best per-
formance. These results are consistent across
languages – with similar performance figures
observed on both the English DUC data set and
on the Portuguese TeM´ario data set. The setting
that is always exceeding the baseline by a large
margin is PageRankW on a directed backward
graph, with clear improvements over the simple
(but powerful) first-sentence selection baseline.
Moreover, comparative evaluations performed
with respect to other systems participating in the
DUC 2002 evaluations revealed the fact that the
performance of the graph-based extractive summa-
rization method is competitive with state-of-the-art
summarization systems.
Interestingly, the “directed forward” setting is
consistently performing worse than the baseline,
which can be explained by the fact that both data
sets consist of newspaper articles, which tend to
concentrate the most important facts toward the be-
ginning of the document, and therefore disfavor a
forward direction set across sentence relations.
For multiple document summarization, the best
“meta” summarizer is the PageRankW algorithm
applied on undirected graphs, in combination with
a single summarization system using the HITSWA
ranking algorithm, for a performance similar to the
one of the best system in the DUC 2002 multi-
document summarization task.
The results obtained during all these experiments
prove that graph-based ranking algorithms, previ-
ously found successful in Web link analysis and so-
cial networks, can be turned into a state-of-the-art
tool for extractive summarization when applied to
graphs extracted from texts. Moreover, the method
was also shown to be language independent, lead-
ing to similar results when applied to the summa-
rization of documents in different languages.
The better results obtained by algorithms like
HITSWA and PageRank on graphs containing only
backward edges are likely to come from the fact that
recommendations flowing toward the beginning of
the text take advantage of the bias giving higher
summarizing value of sentences occurring at the be-
ginning of the document.
Another important aspect of the method is that
it gives a ranking over all sentences in a text (or
a collection of texts) – which means that it can be
easily adapted to extracting very short summaries,
or longer more explicative summaries.
4.5 Related Work
Extractive summarization is considered an impor-
tant first step for more sophisticated automatic text
summarization. As a consequence, there is a large
body of work on algorithms for extractive summa-
rization undertaken as part of the DUC evaluation
exercises (http://www-nlpir.nist.gov/projects/duc/).
Previous approaches include supervised learning
(Hirao et al., 2002), (Teufel and Moens, 1997), vec-
torial similarity computed between an initial ab-
stract and sentences in the given document, intra-
document similarities (Salton et al., 1997), or graph
algorithms (Mihalcea and Tarau, 2004), (Erkan and
Radev, 2004), (Wolf and Gibson, 2004). It is also
notable the study reported in (Lin and Hovy, 2003b)
discussing the usefulness and limitations of auto-
matic sentence extraction for text summarization,
23
Single document “Meta” summarization algorithm
summarization algo. PageRankW -U PageRankW -DB HITSWA -U HITSWA -DB
PageRankW -U 35.52 34.99 34.56 34.65
PageRankW -DB 35.02 34.48 35.19 34.39
HITSWA -U 33.68 32.59 32.12 34.23
HITSWA -DB 35.72 35.20 34.62 34.73
Table 4: Results for multi-document summarization (U = Undirected; DB = Directed Backward)
which emphasizes the need of accurate tools for
sentence extraction as an integral part of automatic
summarization systems.
5 Conclusions
Intuitively, iterative graph-based ranking algo-
rithms work well on the task of extractive summa-
rization because they do not only rely on the local
context of a text unit (vertex), but they rather take
into account information recursively drawn from
the entire text (graph). Through the graphs it builds
on texts, a graph-based ranking algorithm identifies
connections between various entities in a text, and
implements the concept of recommendation. A text
unit recommends other related text units, and the
strength of the recommendation is recursively com-
puted based on the importance of the units making
the recommendation. In the process of identifying
important sentences in a text, a sentence recom-
mends another sentence that addresses similar con-
cepts as being useful for the overall understanding
of the text. Sentences that are highly recommended
by other sentences are likely to be more informa-
tive for the given text, and will be therefore given a
higher score.
In this paper, we showed that a previously pro-
posed method for graph-based extractive summa-
rization can be successfully applied to the sum-
marization of documents in different languages,
without any requirements for additional knowl-
edge or corpora. Moreover, we showed how a
meta-summarizer relying on a layered application
of techniques for single-document summarization
can be turned into an effective method for multi-
document summarization. Experiments performed
on standard data sets have shown that the results ob-
tained with this method are comparable with those
of state-of-the-art systems for automatic summa-
rization, while at the same time providing the bene-
fits of a robust language independent algorithm.
Acknowledgments
We are grateful to Lucia Helena Machado Rino for
making available the TeM´ario summarization test
collection and for her help with this data set.
References
S. Brin and L. Page. 1998. The anatomy of a large-scale
hypertextual Web search engine. Computer Networks
and ISDN Systems, 30(1–7).
DUC. 2002. Document understanding conference 2002.
http://www-nlpir.nist.gov/projects/duc/.
G. Erkan and D. Radev. 2004. Lexpagerank: Prestige in
multi-document text summarization. In Proceedings
of the Conference on Empirical Methods in Natural
Language Processing, Barcelona, Spain, July.
T. Hirao, Y. Sasaki, H. Isozaki, and E. Maeda. 2002.
Ntt’s text summarization system for duc-2002. In
Proceedings of the Document Understanding Confer-
ence 2002.
J.M. Kleinberg. 1999. Authoritative sources in a hyper-
linked environment. Journal of the ACM, 46(5):604–
632.
C.Y. Lin and E.H. Hovy. 2003a. Automatic evalua-
tion of summaries using n-gram co-occurrence statis-
tics. In Proceedings of Human Language Technology
Conference (HLT-NAACL 2003), Edmonton, Canada,
May.
C.Y. Lin and E.H. Hovy. 2003b. The potential and lim-
itations of sentence extraction for summarization. In
Proceedings of the HLT/NAACL Workshop on Auto-
matic Summarization, Edmonton, Canada, May.
R. Mihalcea and P. Tarau. 2004. TextRank – bringing
order into texts. In Proceedings of the Conference on
Empirical Methods in Natural Language Processing
(EMNLP 2004), Barcelona, Spain.
R. Mihalcea, P. Tarau, and E. Figa. 2004. PageR-
ank on semantic networks, with application to word
sense disambiguation. In Proceedings of the 20st In-
ternational Conference on Computational Linguistics
(COLING 2004), Geneva, Switzerland.
T.A.S. Pardo and L.H.M. Rino. 2003. TeMario: a cor-
pus for automatic text summarization. Technical re-
port, NILC-TR-03-09.
G. Salton, A. Singhal, M. Mitra, and C. Buckley. 1997.
Automatic text structuring and summarization. Infor-
mation Processing and Management, 2(32).
S. Teufel and M. Moens. 1997. Sentence extraction
as a classification task. In ACL/EACL workshop on
”Intelligent and scalable Text summarization”, pages
58–65, Madrid, Spain.
F. Wolf and E. Gibson. 2004. Paragraph-, word-,
and coherence-based approaches to sentence ranking:
A comparison of algorithm and human performance.
In Proceedings of the 42nd Meeting of the Associa-
tion for Computational Linguistics, Barcelona, Spain,
July.
24
