Proceedings of the COLING/ACL 2006 Student Research Workshop, pages 37–42,
Sydney, July 2006. c©2006 Association for Computational Linguistics
Investigations on Event-Based Summarization 
 
 
Mingli Wu 
Department of Computing 
The Hong Kong Polytechnic University 
Kowloon, Hong Kong 
csmlwu@comp.polyu.edu.hk 
 
 
 
Abstract 
We investigate independent and relevant 
event-based extractive mutli-document 
summarization approaches. In this paper, 
events are defined as event terms and as-
sociated event elements. With independ-
ent approach, we identify important con-
tents by frequency of events. With rele-
vant approach, we identify important 
contents by PageRank algorithm on the 
event map constructed from documents. 
Experimental results are encouraging. 
1 Introduction 
With the growing of online information, it is in-
efficient for a computer user to browse a great 
number of individual news documents. Auto-
matic summarization is a powerful way to over-
come such difficulty. However, the research lit-
erature demonstrates that machine summaries 
need to be improved further.  
The previous research on text summarization 
can date back to (Luhn 1958) and (Edmundson 
1969). In the following periods, some researchers 
focus on extraction-based summarization, as it is 
effective and simple. Others try to generate ab-
stractions, but these works are highly domain-
dependent or just preliminary investigations. Re-
cently, query-based summarization has received 
much attention. However, it is highly related to 
information retrieval, another research subject. In 
this paper, we focus on generic summarization. 
News reports are crucial to our daily life. In this 
paper, we focus on effective summarization ap-
proaches for news reports.  
Extractive summarization is widely investi-
gated in the past. It extracts part of document(s) 
based on some weighting scheme, in which dif-
ferent features are exploited, such as position in 
document, term frequency, and key phrases. Re-
cent extraction approaches may also employ ma-
chine learning approaches to decide which sen-
tences or phrases should be extracted. They 
achieve preliminary success in different applica-
tion and wait to be improved further. 
Previous extractive approaches identify the 
important content mainly based on terms. Bag of 
words is not a good representation to specify an 
event. There are multiple possible explanations 
for the same collection of words. A predefined 
template is a better choice to represent the event. 
However it is domain-dependent and need much 
effort to create and fill it. This tension motivates 
us to seek a balance between effective imple-
mentation and deep understanding. 
According to related works (Filatovia and 
Hatzivassiloglou, 2004) (Vanderwende et al., 
2004), we assume that event may be a natural 
unit to convey meanings of documents. In this 
paper, event is defined as the collection of event 
terms and associated event elements in clause 
level. Event terms express the meaning of actions 
themselves, such as “incorporate”. In addition to 
verbs, action nouns can also express meaning of 
actions and should be regarded as event terms. 
For example, “incorporation” is action noun. 
Event elements include named entities, such as 
person name, organization name, location, time. 
These named entities are tagged with GATE 
(Cunningham et al., 2002). Based on our event 
definition, independent and relevant event-based 
approaches are investigated in this research. Ex-
periments show that both of them achieve en-
couraging results.  
The related works are discussed in Section 2. 
Independent event-based summarization ap-
proach is described in Section 3. Relevant event-
based summarization approach is described in 
Section 4. Section 5 presents the experiments and 
37
evaluations. Then the strength and limitation of 
our approaches are discussed in Section 6. Fi-
nally, we conclude the work in Section 7. 
2 Related Work 
Term-based extractive summarization can date 
back to (Luhn, 1958) and (Edmundson, 1969). 
This approach is simple but rather applicable. It 
represents the content of documents mainly by 
bag of words. Luhn (1958) establishes a set of 
“significant” words, whose frequency is between 
a higher bound and a lower bound. Edmundson 
(1969) collects common words, cue words, ti-
tle/heading words from documents. Weight 
scores of sentences are computed based on 
type/frequency of terms. Sentences with higher 
scores will be included in summaries. Later re-
searchers adopt tf*idf score to discriminate 
words (Brandow et al., 1995) (Radev et al., 
2004). Other surface features are also exploited 
to extract important sentence, such as position of 
sentence and length of sentence (Teufel and 
Moens, 1999) (Radev et al., 2004). To make the 
extraction model suitable for documents in dif-
ferent domains, recently machine learning ap-
proaches are widely employed (Kupiec et al., 
1995) (Conroy and Schlesinger, 2004).  
To represent deep meaning of documents, 
other researchers have investigated different 
structures. Barzilay and Elhadad (1997) segment 
the original text and construct lexical chains. 
They employ strong chains to represent impor-
tant parts of documents. Marcu (1997) describes 
a rhetorical parsing approach which takes unre-
stricted text as input and derives the rhetorical 
structure tree. They express documents with 
structure trees. Dejong (1978) adopts predefined 
templates to express documents. For each topic, 
the user predefines frames of expected informa-
tion types, together with recognition criteria. 
However, these approaches just achieve moder-
ate results. 
Recently, event receives attention to represent 
documents. Filatovia and Hatzivassiloglou 
(2004) define event as action (verbs/action 
nouns) and named entities. After identifying ac-
tions and event entities, they adopt frequency 
weighting scheme to identify important sentence. 
Vanderwende et al. (2004) represent event by 
dependency triples. After analysis of triples they 
connect nodes (words or phrases) by way of se-
mantic relationships. Yoshioka and Haraguchi 
(2004) adopt a similar approach to build a map, 
but they regard sentence as the nodes of the map. 
After construction of a map representation for 
documents, Vanderwende et al. (2004), and Yo-
shioka and Haraguchi (2004) all employ PageR-
ank algorithm to select the important sentences. 
Although these approaches employ event repre-
sentation and PageRank algorithm, it should be 
noted that our event representation is different 
with theirs. Our event representation is based on 
named entities and event terms, without help of 
dependency parsing. These previous event-based 
approaches achieved promising results.  
3 Independent Event-based Summari-
zation 
Based on our observation, we assume that events 
in the documents may have different importance. 
Important event terms will be repeated and al-
ways occur with more event elements, because 
reporters hope to state them clearly. At the same 
time, people may omit time or location of an im-
portant event after they describe the event previ-
ously. Therefore in our research, event terms oc-
curs in different circumstances will be assigned 
different weights. Event terms occur between 
two event elements should be more important 
than event terms occurring just beside one event 
elements. Event terms co-occurring with partici-
pants may be more important than event terms 
just beside time or location.  
The approach on independent event-based 
summarization involves following steps.  
1. Given a cluster of documents, analyze 
each sentence one at a time. Ignore sen-
tences that do not contain any event ele-
ment. 
2. Tag the event terms in the sentence, which 
is between two event elements or near an 
event element with the distance limitation. 
For example, [Event Element A, Even 
Term, Event Element B], [Event Term, 
Event Element A], [Event Element A, 
Event Term] 
3. Assign different weights to different event 
terms, according to contexts of event 
terms. Different weight configurations are 
described in Section 5.2. Contexts refer to 
number of event elements beside event 
terms and types of these event elements. 
4. Get the average tf*idf score as the weight 
of every event term or event element. The 
algorithm is similar with Centroid.  
38
5. Sum up the weights of event terms and 
event elements in a sentence. 
6. Select the top sentences with highest 
weights, according to the length of sum-
mary. 
4 Relevant Event-based Summarization  
Independent event-based approaches do not ex-
ploit relevance between events. However, we 
think that it may be useful to identify important 
events. After a document is represented by 
events, relevant events are linked together. We 
made the assumption that important events may 
be mentioned often and events associated to im-
portant events may be important also. PageRank 
is a suitable algorithm to identify the importance 
of events from a map, according to the previous 
assumption. In the following sections, we will 
discuss how to represent documents by events 
and how to identify important event with PageR-
ank algorithm. 
4.1 Document Representation 
We employ an event map to represent content of 
a document cluster, which is about a certain 
topic. In an event map, nodes are event terms or 
event elements, and edges represent association 
or modification between two nodes. Since the 
sentence is a natural unit to express meanings, 
we assume that all event terms in a sentence are 
all relevant and should be linked together. The 
links between every two nodes are undirectional.  
In an ideal case, event elements should be 
linked to the associated event terms. At the same 
time, an event element may modify another ele-
ment. For example, one element is a head noun 
and another one is the modifier. An event term 
(e.g., verb variants) may modify an event ele-
ment or event term of another event. In this case, 
a full parser should be employed to get associa-
tions or modifications between different nodes in 
the map. Because the performance of current 
parsing technology is not perfect, an effective 
approach is to simulate the parse tree to avoid 
introducing errors of a parser. The simplifica-
tions are described as follows. Only event ele-
ments are attached with corresponding event 
terms. An event term will not be attached to an 
event element of another event. Also, an event 
element will not be attached to another event 
element. Heuristics are used to attach event ele-
ments with corresponding event terms. 
Given a sentence “Andrew had become little 
more than a strong rainstorm early yesterday, 
moving across Mississippi state and heading for 
the north-eastern US”, the event map is shown in 
Fig. 1. After each sentence is represented by a 
map, there will be multiple maps for a cluster of 
documents. If nodes from different maps are 
lexical match, they may denote same thing and 
should be linked. For example, if named entity 
“Andrew” occurred in Sentence A, B and C, then 
the three occurrences O
A
, O
B
 and O
C
 will be 
linked as O
A
—O
B
, O
B
—O
C
, O
C
—O
A
. By this 
way, maps for sentences can be linked based on 
same concepts. 
 
 
Figure 1. Document representation with event 
map 
4.2 Importance Identification by PageRank 
Given a whole map for a cluster of documents, 
the next step is to identify focus of these docu-
ments. Based on our assumption about important 
content in the previous section, PageRank algo-
rithm (Page et al., 1998) is employed to fulfill 
this task. PageRank assumes that if a node is 
connected with more other nodes, it may be more 
likely to represent a salient concept. The nodes 
relevant to the significant nodes are closer to the 
salient concept than those not. The algorithm 
assigns the significance score to each node ac-
cording to the number of nodes linking to it as 
well as the significance of the nodes. In PageR-
ank algorithm, we use two directional links in-
stead for every unidirectional link in Figure 1. 
 The equation to calculate the importance (in-
dicated by PR) of a certain node A is shown as 
follows: 
)
)(
)(
...
)(
)(
)(
)(
()1()(
2
2
1
1
t
t
BC
BPR
BC
BPR
BC
BPR
ddAPR ++++−=
 
Where B
1
, B
2
,…, B
t
 are all nodes which link to 
the node A. C(B
i
) is the number of outgoing links 
from the node B
i
. The weight score of each node 
can be gotten by this equation recursively. d is 
the factor used to avoid the limitation of loop in 
the map structure. As the literature (Page et al., 
1998) suggested, d is set as 0.85. The signifi-
cance of each sentence to be included in the    
39
summary is then derived from the significance of 
the event terms and event elements it contains. 
5 Evaluation 
5.1 Dataset and Evaluation Metrics 
DUC 2001 dataset is employed to evaluate our 
summarization approaches. It contains 30 clus-
ters and a total of 308 documents. The number of 
documents in each cluster is between 3 and 20. 
These documents are from some English news 
agencies, such as Wall Street Journal. The con-
tents of each cluster are about some specific 
topic, such as the hurricane in Florida. For each 
cluster, there are 3 different model summaries, 
which are provided manually. These model 
summaries are created by NIST assessors for the 
DUC task of generic summarization. Manual 
summaries with 50 words, 100 words, 200 words 
and 400 words are provided. 
Since manual evaluation is time-consuming 
and may be subjective, the typical evaluation 
package, ROUGE (Lin and Hovy, 2003), is em-
ployed to test the quality of summaries. ROUGE 
compares the machine-generated summaries with 
manually provided summaries, based on uni-
gram overlap, bigram overlap, and overlap with 
long distance. It is a recall-based measure and 
requires that the length of the summaries be lim-
ited to allow meaningful comparison. ROUGE is 
not a comprehensive evaluation method and in-
tends to provide a rough description about the 
performance of machine generated summary. 
5.2 Experimental Configuration 
In the following experiments for independent 
event-based summarization, we investigate the 
effectiveness of the approach. In addition, we 
attempt to test the importance of contextual in-
formation in scoring event terms. The number of 
associated event terms and the type of event 
terms are considered to set the weights of event 
terms. The weights parameters in the following 
experiments are chosen according to empirical 
estimations. 
Experiment 1: Weight of any entity is 1. 
Weight of any verb/action noun, which is be-
tween two entities or just beside one entity, is 1. 
Experiment 2: Weight of any entity is 1. 
Weight of any verb/action noun, which is be-
tween two entities, is 3. Weight of any 
verb/action noun, which is just beside one entity, 
is 1. 
Experiment 3: Weight of any entity is 1. 
Weight of any verb/action noun, which is be-
tween two entities and the first entity is person or 
organization, is 5. Weight of any verb/action 
noun, which is between two entities and the first 
entity is not person and not organization, is 3. 
Weight of any verb/action noun, which is just 
after a person or organization, is 2. Weight of 
any verb/action noun, which is just before one 
entity, is 1. Weight of any verb/action noun, 
which is just after one entity and the entity is not 
person and not organization, is 1. 
In the following experiments, we investigate 
the effectiveness of our approaches on under dif-
ferent length limitation of summary. Based on 
the algorithm of experiment 3, we design ex-
periment to generate summaries with length 50 
words, 100 words, 200 words, 400 words. They 
are named Experiment 4, Experiment 5, Ex-
periment 3 and Experiment 6. 
In other experiments for relevant event-based 
summarization, we investigate the function of 
relevance between events. The configurations are 
described as follows. 
Experiment 7: Event terms and event ele-
ments are identified as we discussed in Section 3. 
In this experiment, event elements just include 
named entities. Occurrences of event terms or 
event elements are linked with by exact matches. 
Finally, the PageRank is employed to select im-
portant events and then important sentences. 
Experiment 8: For reference, we select one of 
the four model summaries as the final summary 
for each cluster of documents. ROUGE is em-
ployed to evaluate the performance of these 
manual summaries. 
5.3 Experimental Results 
The experiment results on independent event-
based summarization are shown in table 1. The 
results for relevant event-based summarization 
are shown in table 3. 
 
 Exp. 1 Exp. 2 Exp. 3 
Rouge-1 0.315 0.322 0.323 
Rouge-2 0.049 0.055 0.055 
Rouge-L 0.299 0.305 0.306 
Table 1. Results on independent event-based 
summarization (summary with length of 200 
words) 
 
From table 1, we can see that results of Ex-
periment 2 are better than those of Experiment 1. 
It proves our assumption that importance of 
event terms is different when these event terms 
occur with different number of event elements. 
Results of Experiment 3 are not significant better 
than those of Experiment 2, so it seems that the 
40
assumption that importance of event terms is not 
very different when these event terms occur with 
different types of event elements. Another possi-
ble explanation is that after adjustment of the 
weight for event terms, the difference between 
the results of Experiment 2 and Experiment 3 
may be extended.  
\ 
Table 2. Results on independent event-based 
summarization (summary with different length)  
 
Four experiments of table 2 show that per-
formance of our event based summarization are 
getting better, when the length of summaries is 
expanded. One reason is that event based ap-
proach prefers sentences with more event terms 
and more event elements, so the preferred 
lengths of sentences are longer. While in a short 
summary, people always condense sentences 
from original documents, and use some new 
words to substitute original concepts in docu-
ments. Then the Rouge score, which evaluates 
recall aspect, is not good in our event-based ap-
proach. In contrast, if the summaries are longer, 
people will adopt detail event descriptions in 
original documents, and so our performance is 
improved. 
 
 
 Exp. 7 Exp. 8 
Rouge-1 0.325 0.595 
Rouge-2 0.060 0.394 
Rouge-L 0.305 0.586 
Table 3. Results on relevant event-based 
summarization and a reference experiment 
(summary with length of 200 words) 
 
In table 3, we found the Rouge-score of rele-
vant event-based summarization (Experiment 7) 
is better than independent approach (Experiment 
1). In Experiment 1, we do not discriminate the 
weight of event element and event terms. In Ex-
periment 7, we also did not discriminate the 
weight of event element and event terms. It is 
fair to compare Experiment 7 with Experiment 1 
and it’s unfair to compare Experiment 7 with 
Experiment 3. It looks like the relevance between 
nodes (event terms or event elements) can help to 
improve the performance. However, performance 
of both dependent and independent event-based 
summarization need to be improved further, 
compared with human performance in Experi-
ment 8. 
6 Discussion 
As discussed in Section 2, event-based ap-
proaches are also employed in previous works. 
We evaluate our work in this context. As event-
based approaches in this paper are similar with 
that of Filatovia and Hatzivassiloglou (2004), and 
the evaluation data set is the same one, the re-
sults are compared with theirs. 
 Exp. 4 Exp. 5 Exp. 3 Exp. 6 
Rouge-1 0.197 0.249 0.323 0.382 
Rouge-2 0.021 0.031 0.055 0.081 
Rouge-L 0.176 0.231 0.306 0.367 
 
Fi t-gure 2. Results reported in (Filatovia and Ha
zivassiloglou 2004) 
 
 
Figure 3. Results of relevant event-based ap-
proach 
 
Filatovia and Hatzivassiloglou (2004) report 
the ROUGE scores according to each cluster of 
DUC 2001 data collection in Figure 2. In this 
figure, the bold line represents their event-based 
approach and the light line refers to tf*idf ap-
proach. It can be seen that the event-based ap-
proach performs better. The evaluation of the 
relevant event-based approach presented this pa-
per is shown in Figure 3. The proposed approach 
achieves significant improvement on most 
document clusters. The reason seems that the 
relevance between events is exploited.  
Centroid is a successful term-based summari-
zation approach. For caparison, we employ 
MEAD (Radev et.al., 2004) to generate Cen-
troid-based summaries. Results show that Cen-
troid is better than our relevant event-based ap-
proach. After comparing the summaries given by 
the two approaches, we found some limitation of 
our approach.   
41
Event-based approach does not work well on 
documents with rare events. We plan to dis-
criminate the type of documents and apply event-
based approach on suitable documents. Our rele-
vant event-based approach is instance-based and 
too sensitive to number of instances of entities. 
Concepts seem better to represent meanings of 
events, as they are really things we care about. In 
the future, the event map will be build based on 
concepts and relationships between them. Exter-
nal knowledge may be exploited to refine this 
concept map. 
7 Conclusion 
In this study, we investigated generic summari-
zation. An event-based scheme was employed to 
represent document and identify important con-
tent. The independent event-based approach 
identified important content according to event 
frequency. We also investigated the different 
importance of event terms in different context. 
Experiment showed that this idea achieved prom-
ising results. Then we explored summarization 
under different length limitation. We found that 
our independent event-based approaches acted 
well with longer summaries. 
In the relevant event-based approach, events 
were linked together by same or similar event 
terms and event elements. Experiments showed 
that the relevance between events can improve 
the performance of summarization. Compared 
with close related work, we achieved encourag-
ing improvement.  
References  
Regina Barzilay, and Michael Elhadad. 1997. Using 
lexical chains for text summarization. In Proceed-
ings of the ACL’97/EACL’97 Workshop on Intel-
ligent Scalable Text Summarization, 10-17. 
Ronald Brandow, Karl Mitze, and Lisa F. Rau. 1995. 
Automatic condensation of electronic publications 
by sentence selection. Information Processing and 
Management 31(5):675-686. 
John M. Conroy and Judith D. Schlesinger. 2004. 
Left-brain/right-brain multi-document summariza-
tion. Available at http://duc.nist.gov/pubs.html
Hamish Cunningham, Diana Maynard, Kalina 
Bontcheva, Valentin Tablan. 2002. GATE: a 
framework and graphical development environ-
ment for robust NLP tools and applications. In 
Proceedings of the 40th Annual Meeting of the As-
sociation for computational Linguistics (ACL’02). 
Gerald Francis DeJong. 1978. Fast skimming of news 
stories: the FRUMP system. Ph.D. thesis, Yale 
University. 
H.P. Edmundson. 1969. New methods in automatic 
extracting. Journal of the Association for comput-
ing machinery, 16(2):264-285. 
Elena Filatova and Vasileios Hatzivassiloglou. Event-
based extractive summarization. 2004. In Proceed-
ings of the ACL-04 Workshop, 104-111. 
Julian Kupiec, Jan Pedersen and Francine Chen. 1995. 
A trainable document summarizer. In Proceedings 
of the 18
th
 ACM-SIGIR conference, 68-73. 
Chin-Yew Lin and Eduard Hovy. 2003. Automatic 
Evaluation of Summaries Using N-gram Co-
occurrence Statistics. In Proceedings of HLT-
NAACL, Edmonton, Canada, May. 
H.P. Luhn. 1958. The automatic creation of literature 
abstracts. IBM Journal of Research and Develop-
ment 2:159-165. 
Daniel Marcu. 1997. The rhetorical parsing of natural 
language texts. In Proceedings of the 35
th
 Annual 
Meeting of the Association for computational Lin-
guistics (ACL’97), 96-103. 
Dragomir R. Radev, Timothy Allison, Sasha Blair-
Goldensohn, John Blitzer, Arda Celebi, Stanko 
Dimitrov, Elliott Drabek, Ali Hakim, Wai Lam, 
Danyu Liu, Jahna Otterbacher, Hong Qi, Horacio 
Saggion, Simone Teufel, Michael Topper, Adam 
Winkel, Zhu Zhang. 2004. MEAD - a platform for 
multidocument multilingual text summarization. 
LREC 2004. 
Simone Teufel and Marc Moens. 1999. Argumenta-
tive classification of extracted sentences as a first 
step towards flexible abstracting. Advances in 
Automatic Text Summarization, Inderjeet Mani 
and Mark T. Maybury (editors), 137-154. Cam-
bridge, Massachusetts: MIT Press. 
Larry Page, Sergey Brin, et al. 1998. The PageRank 
Citation Ranking: Bring Order to the Web. Techni-
cal Report, Stanford University, 1998. 
Lucy Vanderwende, Michele Banko, and Arul 
Menezes. 2004. Event-centric summary generation. 
Available at http://duc.nist.gov/pubs.html
Masaharu Yoshioka and Makoto Haraguchi. 2004. 
Multiple news articles summarization based on 
event reference information. In Working Notes of 
the Fourth NTCIR Workshop Meeting, National 
Institute of Informatics, 2004. 
 
42
