Automatic summarization of search engine hit lists 
Dragomir R. Radev 
School of Information, University of Michigan 
550 E. University St. 
Ann Arbor,/vii 48109 
radev@umich, edu 
Weiguo Fan 
University of Michigan Business School 
701 Tappan St. 
Ann Arbor, M148109 
wfan@umich, edu 
Abstract 
We present our work on open-domain 
multi-document summarization in the 
framework of Web search. Our system, 
SNS (pronounced "essence"), retrieves 
documents related to an unrestricted user 
query and summarizes a subset of them as 
selected by the user. We present a task- 
based extrinsic evaluation of the quality of 
the produced multi-document summaries. 
The evaluation results show that 
summarization quality is relatively high 
and does help improve the reading speed 
and judge the relevance of the retrieved 
URLs. 
1 Introduction 
Online information is increasingly available at 
an exponential rate. According to a recent 
study by NetSizer (2000), the number of web 
hosts has increased from 30 million in 
Jan.1998 to 44 million in Jan. 1999, and to 
more than 70 million in Jan. 2000. More than 
2 million new hosts were added to the Internet 
in Feb. 2000, according to this report. Similar 
Internet growth results were reported by 
Intemet Domain Service (IDS, 2000). The 
number of web pages on the Intemet was 320 
million pages in Dec. 1997 as reported by 
Lawrence et al. (1997), 800 million in Feb. 
1999 (Lawrence et al. 1999), and more than 
1,720 million in March, 2000 (Censorware, 
2000). The number of pages available on the 
Internet almost doubles every year. 
To help alleviate the information overload 
problem and help users find the information 
they need, many search engines emerge. They 
build a huge centralized database to index a 
portion of the Intemet: ranging from 10 
million to more than 300 million of web 
pages. Search engines do help reduce the 
information overload problem by allowing a 
user to do a centralized search, but they also 
bring up another problem for the user: too 
many web pages are returned for a single 
query. To find out which documents are 
useful, the user often have to sift through 
hundreds of pages to find out that only a few 
of them are relevant. Moreover, browsing 
through the long list of retrieval results is so 
tedious that few users would be willing to go 
through. That's why research results have 
shown that search engine users often give up 
their search in the first try, examining no more 
than 10 documents (Jansen et al. 2000). It 
would be very helpful if an effective search 
engine could be designed to help classify the 
retrieved web pages into clusters and provide 
more contextual and summary information to 
help these users explore the retrieval set more 
efficiently. 
Recent advances in information retrieval, 
natural  processing, computational 
linguistics make it easier to build a helpful 
search engine based on summaries of hit lists. 
We describe in this paper a prototype system, 
SNS, which blends the traditional information 
retrieval technology with the advanced 
document clustering and multi-document 
summarization technology in an integrated 
framework. The following steps are performed 
for a given query: 
99 
Figure 1: Architecture diagram 
The general architecture of our system is 
shown in Figure 1. User interaction with SNS 
can be done in three different modes: 
• Web search mode. The user enters a 
general-domain query in the search engine 
(MySearch). The result is a set of related 
documents (the hit-list). The user then 
selects which of the hits should be 
summarized. MEAD, the summarization 
component produces a cross-document 
summary of the documents selected by the 
user from the hit list. 
• Intranet mode. The user indicates what 
collection of documents needs to be 
summarized. These documents are not 
necessarily extracted from the Web. 
• Clustering mode. The user indicates that 
either the hit list of the search engine or a 
stand-alone document collection needs to 
be clustered. CIDR, the clustering 
component, creates clusters of documents. 
For each cluster, MEAD produces a cross- 
document summary. 
Our paper is organized as follows. Sections 2 
4 describe the system. More specifically: 
Section 2 explains how the search engine 
operates, Section 3 deals with the clustering 
module while Section 4 presents the multi- 
document summarizer. Section 5 describes the 
user interface of the system. In Section 6, we 
present some experimental results. After we 
compare our work to related research in 
Section 7, we conclude the paper in Section 8. 
2 Search 
The search component of SNS is a 
personalized search engine called MySearch. 
MySearch utilizes a centralized relational 
database to store all the URL indexes and 
other related URL information. Spiders are 
used to fetch URLs from the Internet. After a 
URL is downloaded, the following steps are 
applied to index the URL: 
• Parse the HTML file, remove all those 
tags 
• Apply Porter's stemming algorithms to 
each keyword. 
• Remove stop words 
• Index each keyword into the database 
along with its frequency and position 
information. 
The contents of URLs are indexed based on 
the locations of the keywords: Anchor, Title, 
and Body. This allows weighted retrieval 
based on different word positions. For 
example, a user can specify that he'd like to 
give a weight 5 for the keyword appearing in 
the title, 4 for anchor, and 2 for body. This 
information can be saved in his personal 
profile and used for later weighted ranking. 
Besides the weighted search, MySearch also 
supports Boolean search and Vector Space 
search (Salton, 1989). For the vector space 
model, the famous TF-IDF is used for ranking 
purpose. We used a modified version of TF- 
IDF: log(or+O.5)*log(N/df), where if means the 
number of times a term appeared in the 
content of an URL, N is the total number of 
documents in the text collection, and dfstands 
for the number of unique URLs in which a 
term appears in the entire collection. 
A user can choose which search method he 
wants to use. He/she can also combine 
Boolean search with Vector Space search. 
These options are provided to give users more 
flexibility to control the retrieval results as 
100 
past research indicated that different ranking 
functions give different performances (Salton, 
1989). 
A sample search for "Clinton" using the TF- 
IDF Vector Space search is shown in Figure 3. 
The keyword "Clinton" is highlighted using a 
different color to help users get more 
contextual information. The retrieval status 
value is shown in a bold black font after the 
URL title. 
3 Clustering 
Our system uses two types of clustered input- 
either the set of hits that the user has selected 
or the output of our own clustering engine - 
CIDR (Columbia Intelligent Document 
Relater). CIDR is described in (Radev et al., 
1999). It uses an iterative algorithm that 
creates as a side product so-called "document 
centroids". The centroids contain the most 
highly relevant words to the entire cluster (not 
to the user query). We use these words to find 
the most salient "themes" in the cluster of 
documents. 
3.1 Finding themes within clusters 
One of the underlying assumptions behind 
SNS is that when a user selects a set of hits 
after reading the single-document summaries 
from the hit list retrieved by the system, he or 
she performs a cognitive activity whereby he 
or she selects documents which appear to be 
related to one or more common themes. The 
multi-document summarization algorithm 
attempts to identify these themes and to 
identify the most salient passages from the 
selected documents using a pseudo-document 
called the cluster centroid which is computed 
automatically from the entire list of hits 
selected by the user. 
3.2 Computing centroids 
Figure 2 describes a sample of a cluster 
centroid. The TF column indicates the average 
term frequency of a given term within the 
cluster. E.g., a TF value of 13.33 for three 
documents indicates that the term "'deny" 
appears 40 times in the three documents. The 
IDF values are computed from a mixture of 
200 MB of news and web-based documents. 
Term TF IDF Score 
app 20.67 8.90 'I 83.88 
lewinsky 34.67 5.25 182.03 
currie 15.33 7.60 116.50 
ms 32.00 3.06 '97.97 
january 25.33 3.30 83.60 
jordan 18.67 4.06 75.81 
referrai 9.00 7.43 66.88 
magaziner 6.67 10.00 66.64 
Deny 13.33 4.92 65.61 
Admit 13.00 4.92 63.97 
monica 14.67 4.29 62.85 
oic 5.67 I 0.00 56.64 
betty 8.00 6.01 48.06 
vernon 8.67 5.49 47.54 
'do .... 32.67 1.40 45.80 
Telephoned 6.67 6.86 45.74 
.you 36.33 1.19 43.30 
i 42.67 0.96 40.84 
clinton 16.33 2.23 36.39 
jones 11.33 3.17 35.88 
or 32.33 ~ 1.09 35.20 
gif 3.33 9.30 31.01 
white 12.00 2.50 30.01 
tripp 4.67 6.23 29.10 
ctv 3.00 ~ 9.30 27.91 
december 7.33 3.71 27.19 
Figure 2: A sample cluster centroifl 
4 Centroid-based summarization 
The main technique that we use for 
summarization is sentence extraction. We 
score individually each sentence within a 
cluster and output these that score the highest. 
A more detailed description of the summarizer 
can be found in (Radev et al., 2000). 
The input to the summarization component is 
a cluster of documents. These documents can 
be either the result of a user query or the 
output of CIDR. 
The summarizer takes as input a cluster old 
documents with a total of n sentences as well 
as a compression ratio parameter r which 
indicates how much of the original cluster to 
preserve. 
101 
The output consists of a sequence of In * r\] 
sentences from the original documents in the 
same order as the input documents. The 
highest-ranking sentences are included 
according to the scoring formula below: 
S~ = wcC~ + wpPi + wfFi 
In the formula, we, wp, wf are weights. Ci is the 
centroid score of the sentence, P~ is the 
positional score of the sentence, and F~ is the 
score of the sentence according to the overlap 
with the first sentence of the document. 
4.1 Centroid value 
The centroid value C~ for sentence Si is 
computed as the sum of the centroid values Cw 
of all words in the sentence. For example, the 
sentence "President Clinton met with Vernon 
Jordon in January" gets a score of 243.34 
which is the sum of the individual eentroid 
values of the words (clinton = 36.39; vernon = 
47.54; jordan = 75.81; january = 83.60). 
Ci = E cw 
w 
4.2 Positional value 
The positional value is computed as follows: 
the first sentence in a document gets the same 
score Cm,~, as the highest-ranking sentence in 
the document according to the centroid value. 
The score for all sentences within a document 
is computed according to the following 
formula: 
Pi = (n - i + 1) . mFx(Ci ) 
n t 
For example, if the sentence described above 
appears as the third sentence out of 30 in a 
document and the largest centroid value of any 
sentence in the given document is 917.31, the 
positional value P3 will be = 28/30 * 917.31 
4.3 First-sentence overlap 
The overlap value is computed as the inner 
product of the sentence vectors for the current 
sentence i and the first sentence of the 
document. The sentence vectors are the n- 
dimensional representations of the words in 
each sentence whereby the value at position i 
of a sentence vector indicates the number of 
occurrences of that word in the sentence. 
Fi = Sl Si 
4.4 Combining the three parameters 
As indicated in (Radev & al., 2000) we have 
experimented with several weighting schemes 
for the three parameters (centroid, position, 
and first-sentence overlap). Until this moment, 
we have not come to the point in which the 
three weights we, wp, and wf are either 
automatically learned or derived from a user 
profile. Instead, we have experimented with 
various sets of empirically determined values 
for the weights. In this paper the results are 
based on equal weights for the three 
parameters wc = wp = wf= 1. 
5 User Interface 
We describe in this section the user interface 
for web search mode as described earlier in 
Section 1. 
One component of our system is the search 
engine (MySearch). The detailed design of the 
search component is discussed in Section 2. 
The result of a sample query "'Clinton" to our 
search engine is shown starting in Figure 4. 
102 
SSEARCH 
Tcmponn'y 
Web In~fa~ 
• ~bgm 
._~J~ 
Displaying tlom 1-10 oftolal ~1~ N 
Sea'c.h: I. court tv online \[LgS\] r 
Court 'IV Onrme Tcxt ofPrtsident Cllhaton's respomcs to Judkkty 
Dragomir R. Radcv * httv:/fwww.©ourttv.¢om/¢ ase ffiles/dlntonerisis/1127' '~ amwerptcxt.htrd 
43 KB 0 Cohmabla U. 
1990-2000 
© ~. of Michipn 
2000 index JLJU.I~. STAR.R SCANDAl., A site chronicling the ¢n'ongdoings ofth,c .., 
Shadow Oovcmmcat of Kcrmct5 W ..... ~: 
• http.J/www, gt ocitles. ¢ om/c apitoll'lill/$~at e/9634/ 27KB '..'i 
Maintain¢d by radev(~fumich.edu 
Figure 3: Sample user query 
A user has the option to choose a specific 
ranking function as well as the number of 
retrieval results to be shown in a single screen. 
The keyword contained in the query string will 
be automatically highlighted in the search 
results to provide contextual information for 
the user. 
The overall interface for SNS is shown in 
Figure 4. On the top right of the frame is the 
MySearch search engine. When a user 
submits a query, the screen in Figure 5 
appears. As can be seen from Figure 5, there is 
a check box along with each retrieved record. 
This allows the user to tell the summarization 
engine which documents he/she wants to 
summarize. After the user clicks the 
summarization button, the summarization 
option screen is displayed as shown in bottom 
of Figure 6. The summarization option screen 
allows a user to specify the summarization 
compression ratio. Figure 7 shows the 
summarization result for four URLs with the 
compression ratio set as 30%. 
103 
_------:tl ....... ,:---- ,,,~,,, I I I =1~i ,~ 
...... >~,<... .... _ .... ~,, ~7~L ,.~ • ~.~ 
"raltllw.=y 
• Abou~ 
Wlllli0 Fmt 
Stmmii~=~mc 
o Columbh U. 
I~8.1000 
O U. of ~,clillm 
2OOO 
,==,~,,,= II~ -~.~,!iliq i 
Please click on "submit" in the frame above to continue. 
Mtilidm=d by rad~n,(~a, s i eh,~a~ 
Figure 4: SNS interface (framed) 
SSEARCH s~ iop s~ ~_~a~ Io .... 
" -- 5 ~,,.B T~'y . ; 
Web \]n~l¢¢ i! 9. ooficv ncws events ~ ~¢finl rc~rt dot=Is chinese ¢s~ie~rm~¢ cff~ \[1.76\] I~ 
• Ab~t 1~o~¢y corn Niwi Evmts Dili~F Biiii~i i Cox Ripmt D, ctliRi Chi~st 
• ~ F..il#muqpl FJTolrtl TodI~fl NIwl D.-. :: 
W "ti~t~ Fm Close Up Foundation U S PoFlcy Towlrd Cuba ~ Up Foundafiou Spctill :;:l 
Summ~mtfiom Tol~c Psq~ U S Poficy TowL.. 
Drlt!,omh" R. Rt~l~ o htlod/www.dgleuo.om/euba.hi m 291-~ :::! 
................................................................................................ : ................... 
o C~lumbil U. 
199g-2000 ENext 1 > >1 :':~I 
0 'I3. of~J~i~ ........................................................................................................................................ 
2000 Pmme~ by MySQL. ~,~pacl~e m = ~m L~m SP.~RC mv~. l~ySmscl ~s ~ sazm a~e sncegvv~at ~ 
e~cmmmezce Ib li ~ U~,~.~'~ of ~,~c~s Bm~e~ 
Ally ~',,~etl in~ llulllellion.l Io ttle sylll=l liloul~l be lt.ected Io lbe w ~b~ter. 
lease c !ck on su mi in he frame a ov e con :uue. .... ~l 
ld.a,~o4 by rade~,t~ch.fdu .I 
I 
~.. ,-:s,=.=o..;., o:,.: .,- ,. ,.:~ ,.:..+ ,"<- ..... - • .~-- , -- ~~ ~1 
~igure 5: Sea:ch output along with user selection of documents to be summarized 
104 
SSEARCH 
r~ 
Web Int~'fa~e 
• ,About 
10. dose foundatlon pol;c V cub:~ \[1.74;\] I~ 
Close Up Foundation U S Policy Toward Cuba Close Up Fo~ Special 
Topic Page U S Policy Towa... 
o http~',~.ww.ClOSeUp.o:~cu~htm 29 K~ 
S©areb: Stmm, mm~2~ag 4 welguo Fan 
:t0:i hnp.~/w~v ¢!o,;c,~.o~cubxhtm 
o U. of Midian 
i ~ed by radev(~umfch.edu l 
t 
Figure 6: Selected documents for summarization 
SSEARCH 
Tempm'm'y 
Web Im:c:ffacc 
• .Abou+. 
Se~cl~ 
Wci~uo Fan 
Smmmeiantio~ 
Dragomir R. P, adev 
C Columbia U. 
1998-20130 
O U. of Miehi~n 
2OOO 
10. clos9 fotmda6on pol;c.y cuba !1.76\] I~ 
Summary @ 30% of the URLs that you 
htqrJ/www.©Om'ttv, eem/e~gf~s/c~tonczLt~/l12798_answerst~ht~ 
hl~'J/~,,w.scatllcl~c~ com/czlz~/l~owsc&tm~71~_O52297.hmd 
selected: 
ii I 
Snn'mq~,~ng 1360 .~cm, anccs ~ 30% = 408 scntc~nccs 
S'~,tion stmrt©d 
iTightcnmg tbc Embm'g°F°r akn°st fmW Yeses the Unlt©d States has n°t imported any Cubm ~53.71 
~ .nor allowed ~ ~ food.m~dk~ supplies.re', c~ to. ¢~ ~ ...... i i ! 
~o o~c~ ¢ounUy has joined lhc Umlcd States in the lmdc cmbmrgo agminst Cuba in fact the 
M=finta~ncd by r~evC,wnich.e~ 
Figure 7: Output of the summarizer 
105 
The following information is shown in the 
summarization result screen in Figure 7: 
• The number of sentences in the text of the 
set of URLs that the user selected 
• The number of sentences in the summary 
The sentences representing the themes of those 
selected URLs and their relative scores. The 
sentences are ordered the same way they appear 
in the original set of documents. 
6 Experimental results 
Our system was evaluated using the task-based 
extrinsic measure as suggested in (Mani et al. 
1999). The experiment was set up as follows: 
Three sets of documents on different topics were 
selected prior to the experiment. The topics and 
their corresponding document information are 
shown in Table 1. 
~opie No. Lengtl~ 
S1 200k 
$2 Introduction to Data Mining 100k 
$3 Intelligent Agents and their application in 
Information retrieval 5 160k 
Table 1: Evaluation Topics and their corresponding document set information 
!i 
I 
The term data mining is then this high-level application techniques / tools used to 5^9.. 
present and analyze data for decision makers ~ 
~!! ~!' ==I!ii1\] IilIi=Iii~ " 
® ~, = ..... ..... , ....... . ,= = ~.~ ,116111 =., = ...... -.= ...,... ......... ,, .......... = iI~.',=~ .==.,~ 
!\],!~:;= =!, Ii 
Figure 8: A sample of the summarization result for $2 at 10% compression rate 
As Table 1 shows, the articles in topic set $1 are 
longer than both these in $2 and $3. The articles 
in $3 are the shortest, with each 32k in average. 
The number of documents in each topic set is 
106 
also different. The variations of document length 
and different number of documents in each topic 
set will help test the robustness of our 
summarization algorithms. 
We used SNS to generate both 10% and 20% 
summaries for each topic. A sample of the 10% 
summary for topic $2 is shown in Figure 8. Four 
users were selected for evaluation of these 
summarization results. Each user was asked to 
read through the set of full articles for each topic 
f'wst, followed by its corresponding 10% and 
20% summaries. After these 4 users finished 
each set, they were asked to assign a readability 
score (1-10) for each summary. The higher the 
readability score is, the more readable and 
meaningful for comprehension is the summary. 
The time of reading both full articles and 
summaries was tracked and recorded. 
Table 2: Summarization evaluation: detailed results 
7.92 
Table 3: Summary of the evaluation results 
The detailed evaluation results are shown in topics. The summaries generated by SNS are 
Table 2. Table 3 gives the summary of the Table also very readable. For example, The average 
2. It's shown in Table 2 that these four users readability score (which is obtained by 
have different reading speeds. However, their averaging the readability scores assigned by the 
reading speed is pretty consistent across the 3 four users) for 10% and 20% summaries for 
107 
topic S1, is 8, 8 respectively. For topic $3, the 
average readability score for 10% and 20% 
summaries is 7.75, and 8.75, respectively. 
Similarly, for $2 the average readability score 
for 10% and 20% summaries is 8 and 8.5, 
respectively. The differences in the average 
readability score also suggest that (a) our 
summarizer favors longer documents over 
shorter documents; (tO 20% summaries are 
generally favorable over 10% summaries. The 
difference in the readability score between 10% 
and 20% summaries is bigger in $3 (diff = 1.0) 
than in S1 (diff = 0). These interesting findings 
raise interesting questions for future research. 
As can be seen from Table 3, the 20% summary 
achieves better readability score in overall than 
the 10% summary. The speedup of the 10% 
summary over full articles is 6.87. That is, with 
reading material reduced by 900%, the speedup 
in reading is only 687%. This suggests that there 
may be a little bit difficulty in reading the 10% 
summary result. This may be due to the simple 
sentence boundary detection algorithm we used. 
The feedback from users in the evaluation seems 
to corffirm the above reason. As more sentences 
were included in the 20% summaries, the 
speedup in reading (4.22) almost approached the 
optimal speedup ratio (5.0)L 
7 Related Work 
Neto et al. (2000) describes a text mining tool 
that performs document clustering and text 
summarization. They used the Autoclass 
algorithm to perform document clustering and 
used TF-ISF (an adaptation of TF-IDF) to 
perform sentence ranking and generate the 
summarization output. Our work is different 
from theirs in that we perform personalized 
summarization based on the retrieval result from 
a generic personalized web-based search engine. 
A more complicated sentence ranking functions 
is employed to boost the ranking performance. 
The compression ratio for the summary is 
customizable by a user. Both single-document 
for a single URL and multiple-document 
i Since the length of the summary is only 20% of the 
original documents, the maximum speedup in terms of 
reading time is 1/0.2=5. 
summarization for a cluster of URLs are 
supported in our system. 
More related work can be found in Extractor 
web site http'J/extractor.iit.nrc.ca/. They use 
MetaCrawler to perform web-based search and 
automatically generate summaries for each 
URLs retrieved. They only support single 
document summarization in their engine and the 
compression rate of the summarizer is also non- 
customizable. We not only support both single 
and multiple document summarization, but also 
allow the user to specify the summarization 
compression ratio as well as to get per-cluster 
summaries of automatically generated clusters, 
which, we believe, are more valuable to online 
users and give them more flexibility and control 
of the surnrnarization results. 
8 Conclusion and Future Work 
We described in this paper a prototype system 
SNS, which integrates natural  
processing and information retrieval techniques 
to perform automatic customized summarization 
of search engine results. The user interface and 
detailed design of SNS's components are also 
discussed. Task-based extrinsic evaluation 
showed that the system is of reasonably high 
quality. 
The following issues will be addressed in the 
future. 
8.1 Interaction between sentence inclusion 
in a summary 
There are two types of interaction (or 
reinforcement) between sentences in a summary: 
negative and positive. 
Negative interaction occurs when the inclusion 
of one sentence in the summary indicates that 
another sentence should not appear in the 
summary. This is particularly relevant to multi- 
document summarization as in this case: 
negative interaction models the non-inclusion of 
redundant information. 
The case of positive interaction involves positive 
reinforcement between sentences. For example, 
if a sentence with a referring expression is to be 
108 
included in a stma~lary, typically the sentence 
containing the antecedent should also be added. 
We will investigate specific setups in which 
positive and/or negative reinforcement between 
sentences is practical and useful. 
8.2 Personalization 
We will investigate additional techniques for 
producing personalized summaries. Some of the 
approaches that we are considering are: 
Query words: favoring sentences that 
include words from the user query in the 
Web-based scenario 
Personal preferences and interaction history: 
we would favor sentences that match the 
user profile (e.g., overlapping with his or her 
long-term interests and/or recent queries 
logged by the system). 
8.3 Technical limitations 
The current version of our system uses a fairly 
basic sentence delimiting component. We will 
investigate the user of robust sentence boundary 
identification modules in the future. 
We will also investigate the possibility of some 
limited-form anaphora resolution component. 
8.4 Availability 
A demonstration version of SNS is available at 
the following UP.L: 
http://www.si.umich.edu/-radev/ssearch/ 

References 
Carbonell, J. and Goldstein, J. (1998). The use of 
MMR, Diversity-Based Reranking for Reordering 
Documents and Producing Summaries. Poster 
Session, SIGIR'98, Melbourne, Australia. 
Censorware (2000). 
http://www.censorware.org/web size/. 
Extractor (2000). http://extractor.iit.nrc.ca/. 
IDS . (2000). lnternet Domain Survey. 
http://www:isc.org/dsl. 
Jansen, B. J., Spink, A., and Saracevic, T. (2000). 
Real life. real users, and real needs: a study and 
analysis of user queries on the web. Information 
Processing and Management. 36(2), 207-227. 
Lawrence, S., and Giles, C. L. (1997). Searching the 
World Wide Web, Science, 280(3), 98-100. 
Lawrence, S., and Giles, C. L. (1999). Accessibility of 
information on the web, Nature, 400, 107-109. 
Mani, I. and BIoedorn, E. (1999). Summarizing 
similarities and di~rences among related 
documents. Information Retrieval 1(1): 35--67. 
Mani, I., House, D., Klein, G., Hirschman, L., Obrst, 
L., Firmin, T., Chrzanowski, M., and Sundheim, B. 
(1998). The TIPSTER SUMMA C Text 
Summarization Evaluation. The MITRE 
Corporation Technical Report MTR 98W0000138, 
McLean, Virginia. 
McKeown, K. and D. R. Radev. Generating Summaries 
of Multiple News Articles. Proceedings, ACM 
Conference on Research and Development in 
Information Retrieval S1GIR'95 (Seattle, WA, July 
1995). 
NetSizer (2000). http~//www.netsizer.com/. 
Neto, J. L., Santos, A. D., Kaestner, C. A. A., and 
Freitas, A. A. (2000). Document clustering and text 
summarization. In Proceedings, 4th Int. Conference 
on Practical Applications of Knowledge Discovery 
and Data Mining (PADD-2000), 41-55. London: 
The Practical Application Company. 
Radev, D. R., Hatzivassiloglou, V., and McKeown, 
K. A Description of the CIDR System as Used for 
TDT-2. Proceedings, DARPA Broadcast News 
Workshop, (Herndon, VA, February 1999). 
Radev, D. R, Jing, H., and Stys-Budzikowska, M. 
Summarization of multiple documents: clustering, 
sentence extraction, and evaluation. Proceedings, 
ANLP-NAACL Workshop on Automatic 
Summarization, (Seattle, WA, April 2000) 
Salton, G. (1989). Automatic Text Processing. 
Addison-Wesley Publishing Co., Reading, MA, 
1989. 
