Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL, pages 181–184,
New York, June 2006. c©2006 Association for Computational Linguistics
Improved Affinity Graph Based Multi-Document Summarization 
Xiaojun Wan, Jianwu Yang 
Institute of Computer Science and Technology, Peking University 
Beijing 100871, China 
{wanxiaojun, yangjianwu}@icst.pku.edu.cn 
Abstract
This paper describes an affinity graph 
based approach to multi-document sum-
marization. We incorporate a diffusion 
process to acquire semantic relationships 
between sentences, and then compute in-
formation richness of sentences by a 
graph rank algorithm on differentiated in-
tra-document links and inter-document 
links between sentences. A greedy algo-
rithm is employed to impose diversity 
penalty on sentences and the sentences 
with both high information richness and 
high information novelty are chosen into 
the summary. Experimental results on 
task 2 of DUC 2002 and task 2 of DUC 
2004 demonstrate that the proposed ap-
proach outperforms existing state-of-the-
art systems. 
1 Introduction 
Automated multi-document summarization has 
drawn much attention in recent years. Multi-
document summary is usually used to provide con-
cise topic description about a cluster of documents 
and facilitate the users to browse the document 
cluster. A particular challenge for multi-document 
summarization is that the information stored in 
different documents inevitably overlaps with each 
other, and hence we need effective summarization 
methods to merge information stored in different 
documents, and if possible, contrast their differ-
ences.
A variety of multi-document summarization 
methods have been developed recently. In this 
study, we focus on extractive summarization, 
which involves assigning saliency scores to some 
units (e.g. sentences, paragraphs) of the documents 
and extracting t Ke sentences with highest scores. 
MEAD is an implementation of the centroid-based 
method (Radev et al., 2004) that scores sentences 
based on sentence-level and inter-sentence features, 
including cluster centroids, position, TF*IDF, etc. 
NeATS (Lin and Hovy, 2002) selects important 
content using  Ventence position, term frequency, 
topic signature and term clustering, and then uses 
MMR (Goldstein et al., 1999) to remove redun-
dancy. XDoX (Hardy et al., 1998) identifies the 
most salient themes within the set by passage clus-
tering and then composes an extraction summary, 
which reflects these main themes. Harabagiu and 
Lacatusu (2005) investigate different topic repre-
sentations and extraction methods. 
Graph-based methods have been proposed to 
rank sentences or passages. Websumm (Mani and 
Bloedorn, 2000) uses a graph-connectivity model 
and operates under the assumption that nodes 
which are connected to many other nodes are likely 
to carry salient information. LexPageRank (Erkan 
and Radev, 2004) is an approach for computing 
sentence importance based on the concept of ei-
genvector centrality. Mihalcea and Tarau (2005) 
also propose similar algorithms based on PageR-
ank and HITS to compute sentence importance for 
document summarization.  
In this study, we extend the above graph-based 
works by proposing an integrated framework for 
considering both information richness and infor-
mation novelty of a sentence based on sentence 
affinity graph. First, a diffusion process is imposed 
on sentence affinity graph in order to make the af-
finity graph reflect true semantic relationships be-
tween sentences. Second, intra-document links and 
inter-document links between sentences are differ-
entiated to attach more importance to inter-
document links for sentence information richness 
computation. Lastly, a diversity penalty process is 
imposed on sentences to penalize redundant sen-
tences. Experiments on DUC 2002 and DUC 2004 
data are performed and we obtain encouraging re-
sults and conclusions. 
181
2 The Affinity Graph Based Approach 
The proposed affinity graph based summarization
method consists of three steps: (1) an affinity graph
is built to reflect the semantic relationship between
sentences in the document set; (2) information
richness of each sentence is computed based on the
affinity graph; (3) based on the affinity graph and 
the information richness scores, diversity penalty is 
imposed to sentences and the affinity rank score 
for each sentence is obtained to reflect both infor-
mation richness and information novelty of the 
sentence. The sentences with high affinity rank 
scores are chosen to produce the summary.
2.1 Affinity Graph Building
Given a sentence collection S={si | 1 � i � n}, the af-
finity weight aff(si, sj) between a sentence pair of si
and sj  is calculated using the cosine measure. The 
weight associated with term t is calculated with the
tft*isft formula, where tft is the frequency of term t
in the corresponding sentence and isft is the inverse 
sentence frequency of term t, i.e. 1+log(N/nt),
where N is the total number of sentences and nt is 
the number of sentences containing term t.  If sen-
tences are considered as nodes, the sentence collec-
tion can be modeled as an undirected graph by
generating the link between two sentences if their
affinity weight exceeds 0, i.e. an undirected link
between si and sj (i � j) with affinity weight aff(si,sj)
is constructed if aff(si,sj)>0; otherwise no link is 
constructed. Thus, we construct an undirected
graph G reflecting the semantic relationship be-
tween sentences by their content similarity. The
graph is called as Affinity Graph. We use an adja-
cency (affinity) matrix M to describe the affinity
graph with each entry corresponding to the weight 
of a link in the graph. M = (Mi,j)n�n is defined as 
follows:
)s,s(affM jij,i   (1)
Then M is normalized to make the sum of each
row equal to 1. Note that we use the same notation 
to denote a matrix and its normalized matrix.
However, the affinity weight between two sen-
tences in the affinity graph is currently computed
simply based on their own content similarity and 
ignore the affinity diffusion process on the graph.
Other than the direct link between two sentences,
the possible paths with more than two steps be-
tween the sentences in the graph also convey more
or less semantic relationship. In order to acquire 
the implicit semantic relationship between sen-
tences, we apply a diffusion process  Kandola et 
al., 2002  on the graph to obtain a more appropri-
ate affinity matrix. Though the number of possible 
paths between any two given nodes can grow ex-
ponentially, recent spectral graph theory (Kondor
and Lafferty, 2002) shows that it is possible to
compute the affinity between any two given nodes
efficiently without examining all possible paths. 
The diffusion process on the graph is as follows: 
t1t
1t
~ MM - � f
     J
(2)
where  (0< <1) is the decay factor set to 0.9. 
is the t-th power of the initial affinity matrix
and the entry in it is given by
tM
M
 �  �
    
 �
 
  
 
  
ju,iu
n}{1,...,u
1t
1
u,u
t
ji
t1
t 1
MM
 "
 " ",
(3)
that is the sum of the products of the weights over 
all paths of length t that start at node i and finish at 
node j in the graph on the examples. If the entries 
satisfy that they are all positive and for each node 
the sum of the connections is 1, we can view the 
entry as the probability that a random walk begin-
ning at node i reaches node j after t steps.  The ma-
trix M is normalized to make the sum of each row
equal to 1. t is limited to 5 in this study. 
~
2.2 Information Richness Computation 
The computation of information richness of sen-
tences is based on the following three intuitions: 1) 
the more neighbors a sentence has, the more in-
formative it is; 2) the more informative a sen-
tence�s neighbors are, the more informative it is; 3) 
the more heavily a sentence is linked with other
informative sentences, the more informative it is.
Based on the above intuitions, the information
richness score InfoRich(si) for a sentence si can be
deduced from those of all other sentences linked
with it and it can be formulated in a recursive form
as follows: 
 �
 z
   � �  
ijall
i,jji n
)d1(M~)s(InfoRichd)s(InfoRich
(4)
And the matrix form is: 
en )d1(~d T  & & &      O O M (5)
182
where 1ni )]s(InfoRich[  u   O
 &
is the eigenvector of 
. is a unit vector with all elements equaling 
to 1. d is the damping factor set to 0.85.
T~M e &
Note that given a link between a sentence pair of 
si and sj, if si and sj comes from the same document,
the link is an intra-document link; and if si and sj
comes from different documents, the link is an in-
ter-document link. We believe that inter-document
links are more important than intra-document links 
for information richness computation  Different
weights are assigned to intra-document links and 
inter-document links respectively, and the new af-
finity matrix is:
interintra
~~� MMM  E D    (6)
where intra~M is the affinity matrix containing only
the intra-document links (the entries of inter-
document links are set to 0) and inter~M is the affin-
ity matrix containing only the inter-document links 
(the entries of intra-document links are set to 0). . ,
  are weighting parameters and we let 0 �. ,  � 1.
 7he matrix is normalized and now the matrix  is 
replaced by  in Equations (4) and (5). 
M~
M�
2.3 Diversity Penalty Imposition 
Based on the affinity graph and obtained informa-
tion richness scores, a greedy algorithm is applied
to impose the diversity penalty and compute the
final affinity rank scores of sentences as follows: 
1. Initialize two sets A=�, B={si | i=1,2,�,n}, and
each sentence�s affinity rank score is initialized to 
its information richness score, i.e. ARScore(si) = 
InfoRich(si), i=1,2,�n.
2. Sort the sentences in B by their current affinity rank
scores in descending order.
3. Suppose si is the highest ranked sentence, i.e. the
first sentence in the ranked list. Move sentence si
from B to A, and then a diversity penalty is im-
posed to the affinity rank score of each sentence
linked with si as follows:
For each sentence sj  in B, we have
)InfoRich(sM~&)ARScore(s)ARScore(s iij,jj  � �   (7)
where & >0 is the penalty degree factor. The larger
&  is, the greater penalty is imposed to the affinity
rank score. If & =0, no diversity penalty is imposed
at all. 
4. Go to step 2 and iterate until B= � or the iteration
count reaches a predefined maximum number.
After the affinity rank scores are obtained for all
sentences, the sentences with highest affinity rank 
scores are chosen to produce the summary accord-
ing to the summary length limit.
3 Experiments and Results
We compare our system with top 3 performing
systems and two baseline systems on task 2 of 
DUC 2002 and task 4 of DUC 2004 respectively.
ROUGE (Lin and Hovy, 2003) metrics is used for
evaluation1 and we mainly concern about ROUGE-
1. The parameters of our system are tuned on DUC 
2001 as follows: & =7, . =0.3 and  =1.
We can see from the tables that our system out-
performs the top performing systems and baseline 
systems on both DUC 2002 and DUC 2004 tasks 
over all three metrics. The performance improve-
ment achieved by our system results from three
factors: diversity penalty imposition, intra-
document and inter-document link differentiation
and diffusion process incorporation. The ROUGE-
1 contributions of the above three factors are 
0.02200, 0.00268 and 0.00043 respectively.
System ROUGE-1 ROUGE-2 ROUGE-W
Our System 0.38125 0.08196 0.12390 
S26 0.35151 0.07642 0.11448 
S19 0.34504 0.07936 0.11332 
S28 0.34355 0.07521 0.10956 
Coverage Baseline 0.32894 0.07148 0.10847 
Lead Baseline 0.28684 0.05283 0.09525 
Table 1. System comparison on task 2 of DUC 2002
System ROUGE-1 ROUGE-2 ROUGE-W
Our System 0.41102 0.09738 0.12560 
S65 0.38232 0.09219 0.11528 
S104 0.37436 0.08544 0.11305 
S35 0.37427 0.08364 0.11561 
Coverage Baseline 0.34882 0.07189 0.10622 
Lead Baseline 0.32420 0.06409 0.09905 
Table 2. System comparison on task 2 of DUC 2004
Figures 1-4 show the influence of the parameters 
in our system. Note that . :   denotes the real val-
ues .  and   are set to. �w/ diffusion� is the system
with the diffusion process (our system) and  �w/o
diffusion� is the system without the diffusion proc-
1 We use ROUGEeval-1.4.2 with �-l� or �-b� option for trun-
cating longer summaries, and �-m� option for word stemming. 
183
ess. The observations demonstrate that �w/ diffu-
sion� performs better than �w/o diffusion� for most
parameter settings. Meanwhile, �w/ diffusion� is
more robust than �w/o diffusion� because the 
ROUGE-1 value of �w/ diffusion� changes less
when the parameter values vary. Note that in Fig-
ures 3 and 4 the performance decreases sharply 
with the decrease of the weight   of inter-
document links and it is the worst case when inter-
document links are not taken into account (i.e. . :
 =1:0), while if intra-document links are not taken 
into account (i.e. . : =0:1), the performance is still
good, which demonstrates the great importance of 
inter-document links. 
     
    
     
    
     
    
     
                      
 5 2 8 * (  
 Z  R  G L I I X V L R Q
 Z   G L I I X V L R Q
 Z
Figure 1. Penalty factor tuning on task 2 of DUC 2002
    
    
    
    
   
    
    
                      
 5 2 8 * (  
 Z  R  G L I I X V L R Q
 Z   G L I I X V L R Q
 Z
Figure 2. Penalty factor tuning on task 2 of DUC 2004
    
    
    
    
    
    
    
                                                           
 
 5 2 8 * (  
 Z  R  G L I I X V L R Q
 Z   G L I I X V L R Q
 D  E
Figure3. Intra- & Inter-document link weight tuning on
task 2 of DUC 2002
    
    
    
    
    
    
    
  
 
  
  
 
  
  
 
  
  
 
  
  
 
  
  
 
  
 
  
  
 
  
  
 
  
  
 
  
  
 
  
  
 
  
 
 
 5 2 8 * (   Z  R  G L I I X V L R Q
 Z   G L I I X V L R Q
 D  E
Figure 4. Intra- & Inter-document link weight tuning on
task 2 of DUC 2004
References
G. Erkan and D. Radev. LexPageRank: prestige in multi-
document text summarization. In Proceedings of
EMNLP�04
J. Goldstein, M. Kantrowitz, V. Mittal, and J. Carbonell.
Summarizing Text Documents: Sentence Selection and 
Evaluation Metrics. Proceedings of SIGIR-99. 
S. Harabagiu and F. Lacatusu. Topic themes for multi-
document summarization. In Proceedings of SIGIR�05,
Salvador, Brazil, 202-209, 2005. 
H. Hardy, N. Shimizu, T. Strzalkowski, L. Ting, G. B. Wise,
and X. Zhang. Cross-document summarization by con-
cept classification. In Proceedings of SIGIR�02, Tampere,
Finland, 2002. 
J. Kandola, J. Shawe-Taylor, N. Cristianini. Learning semantic 
similarity. In Proceedings of NIPS�2002.
K. Knight and D. Marcu. Summarization beyond sentence
extraction: a probabilistic approach to sentence compres-
sion, Artificial Intelligence, 139(1), 2002.
R. I. Kondor and J. Lafferty. Diffusion kernels on graphs and
other discrete structures. In Proceedings of ICML�  02.
C.-Y. Lin and E.H. Hovy. Automatic Evaluation of Summa-
ries Using N-gram Co-occurrence Statistics. In Proceed-
ings of HLT-NAACL 2003. 
C.-Y. Lin and E.H. Hovy. From Single to Multi-document
Summarization: A Prototype System and its Evaluation.
In Proceedings of ACL-  02.
I. Mani and E. Bloedorn. Summarizing Similarities and Dif-
ferences Among Related Documents. Information Re-
trieval, 1(1), 2000.
R. Mihalcea and P. Tarau. A language independent algorithm
for single and multiple document summarization. In Pro-
ceedings of IJCNLP�  05.
D. R. Radev, H. Y. Jing, M. Stys and D. Tam. Centroid-based
summarization of multiple documents. Information Proc-
essing and Management, 40: 919-938, 2004.
184
