I 
I 
I 
I 
I 
! 
i 
i 
Automatic Text Summarization by paragraph Extraction 
Mandar Mitrat~'Amit Singhalt, Chris Buckley tt 
tDepartment of Computer Science, Cornell Umverstty, mltra@cs comell edu 
tAT&T Labs Research, smghal@research att coin 
ttSabtr Research, Inc, chnsb@sabu" corn 
Abstract 
Over the years, the amount of information 
avmlable electromcally has grown mamfold 
There Is an increasing demand for automatic 
methods for text summarization Dommn- 
independent techniques for automauc summa- 
nzation by paragraph extractton have been pro- 
posed m (Salton et al, 1994, Salton et al, 
1996b) In tins study, we attempt to evain- 
ate these methods by companng the automat- 
lcally generated extracts to ones generated by 
humans In view of the fact that extracts gen- 
errated by two humans for the same article are 
surprisingly dzssmular, the performance of the 
automatic methods Is satisfactory Even though 
thin observation calls into question the feasibil- 
ity of producing perfect summaries by extrac- 
tion, given the unavallablhty of other effective 
domain-independent summarization tools, we 
beheve that fins m a reasonable, though imper- 
fect, alternative 
1 Introduction 
As the amount of textual mformaUon avmlable electron- 
really grows rapidly, It becomes more difficult for a user 
to cope with all the text that ~s potentially of interest 
of the art (Brandow et al, 1995) Thus, the process of 
automatic summary generaUon generally reduces to the 
.task of extractzon, ~ e, we use heunsUcs based upon a 
detmled staustlcal analysis of word occurrence to Iden- 
tify the text-pieces (sentences, paragraphs, etc ) that are 
likely to be most important, and concatenate the selected 
pieces together to form the final extract I (Luhn, 1958, 
Earl, 1970) 
Techmques for sentence extracuon have been proposed 
In (Brandow et al, 1995, Luhn, 1958, Patce, 1900, Ku- 
plec et al, 1995) In (Salton et al, 1994, Salton et al, 
1996b), the paragraph Is chosen as the umt of extraction 
It was expected that since a paragraph provides more con- 
text, the problems of readablhty and coherence that were 
seen m the summaries generated by sentence extracUon 
would be, at least parUally, amehorated Various prop~ 
emes of the extracts generated by chfferent paragraph 
selection algorithms were observed m prevmus stuthes 
In this study, we intend to do a more detatled evaluaUon 
of these chfferent algorithms 
The remainder of the paper Is orgamzed as follows 
section 2 briefly introduces text relationsinp maps, winch 
consmute the mare tool used in our extraction schemes, 
and outhnes.the paragraph selection algorithms, secUon 3 
descnbes the expenments we conducted m order to eval- 
uate these algorithms, section 4 thscusses the evaluaUon 
method we adopted and the results of our experiments , 
finally, secUon 5 concludes the study 
I Automatm text summarization methods are therefore be- 
coming Increasingly important Consider the process by 2 Background 
wMch a human accomphshes tins task Usually, the fol- 
Iowmg steps are revolved (Brandow et al, 1995) 
1 understandmg the content of the document, 
2 ldenufymg the most nnportant pieces of reformation 
contained m It, 
3 writing up this mformauon 
Given the variety of avadable mformaUon, it would be 
useful to have dommn-mdependent, automauc techniques 
for doing thls However. automating the first and third 
steps for unconstnuned texts Is currently beyond the state 
*Thls study wassupported m part by the NaUonal Science 
Foundatmn under grant IRI-9300124 
2.1 Text Relationship Maps 
Usually, m mformauon retrieval, each text or text ex- 
cerpt is represented by a vector of weighted terms of 
the form D~ = (d,~, d, 2, , , d,,) where d,b rep- 
resents an nnportance weight for term T~ attached to 
document D, The terms attached to documents for con- 
tent representauon purposes may be words or phrases 
derived from the document teKts by an antomauc in- 
dexing procedure, and the term weights are computed 
by taking Into account the occurrence charactensucs of 
\]Henceforth, the term summary is used m tins sense of a 
representauve extract 
39 
• %,. - :#~'~'~r,'-:" : "~" = .~'x "<~'x,..,a "~" 
F~gure 1 Text Relattonslup Map for afacle Telecornmu- 
rotations 
the terms m the mdlwdual documents and the document 
collection as a whole (Salton and McGlll, 1983) As- 
sunung that every text or text excerpt ~s represented m 
vector form as a set of weighted terms, ~t ~s possible 
to compute parwlse sumlanty coefficients, showing the 
snndanty between pars of texts, based on coincidences 
m the term assignments to the respecUve ~tems Typi- 
cally, the vector smulanty rmght be computed as the m- 
nor product between corresponding vector elements, that 
= ~=~ ~, d~, and the snmlanty ~s, S)m (D,, D~) 
functaon might be normahzed to he between 0 for disjoint 
vectors and I for completely \]denucal vectors (Salton, 
1989) The Smart mfolrmauon remeval system (Salton, 
1971) ~s based on these pnnclples and m used m our 
expenments 
In order to decide wl~ch paragraphs of a document 
are most useful for text summanzaUon, we first want to 
determine how the-paragraphs are related to each other 
Tins task ~s accomphshed using a text relationship map 
A text relalaonslnp map m a graplucal representat, on of 
textual structure, m winch paragraphs On general, pieces 
of text) are represented by nodes on a graph and related 
paragraphs are hnked by edges (Salton and Allan, 1993) 
Nodes are joined by hnks based on a numencal sm-alar - 
~ty computed for each par of texts usmg mformaUon re- 
trieval techmques described above Typlcally, a threshold 
value ~s selected, and all pars of paragraphs whose simi- 
larity exceeds the threshold are connected by hnks Since 
the snmlanty between two text vectors m based upon the 
vocabulary overlap between the corresponding texts, ff 
the snmlanty between two vectors ~s large enough (above 
a threshold) to be regarded as non-random, we can say 
that the vocabulary matches between the corresponchng 
texts are meamngful, and the two texts are "semantically 
related" (Salton et al, 1997) 
Figure 1 shows a typical text relaaonslup map The 
paragraphs of the aracle Telecoramumcat, ons (from the 
~aS~ 
~15 Fd 
." . • 
C " ,=- "N' 
L . iI~m, tCl~ 
Figure 2 Text SegmentaUon for aracle Telecommumca- tton$ 
Funk and Wagnalls Encyclopedm (Funk and Wagnalls, 
1979)) are denoted by nodes Paragraphs wbach are suffi: 
clently snndar arejomed by a hnk The smulanty thresh- 
old used m flus map is 0 12 Important concluslons about 
text structure can be drawn from a text relaUonsh~p map 
For example, the nnportance of a paragraph wlthm the 
text is hkely to be related to the number of links lncl- 
dent on the corresponding node The map can be used to 
~denUfy related passages covenng parucular toplc areas 
It also provides mformaUon about the homogeneity of 
the text under cons~deraUon When the map ~s well con- 
nected and has many cross-hnks between paragraphs, and 
direct hnks between adjacent paragraphs, one expects a 
umfied, homogeneous treatment of the topic (Salton et 
al, 1996b) 
A text relaUonslup map maybe used to decompose a 
documentmtosegmenta(Saltonetal,1996a) A segment 
Is a conuguous piece of text that ~s hnked internally, but 
largely dmconnected from the adjacent text (Hearst and 
Plaunt, 1993) Segments are our (automauc) appmxnna- 
uon to secuonmg when a text does not have well defined 
SecUons (as ~s the case with numerous arUcles on the web 
these clays) Consider Figure 2, for example It shows the 
relatxonslup map for the arucle on Telecommumcattons 
at a sumlanty threshold of 0 12 with hnks between chs- 
rant paragraphs (paragraphs that are more than five apart) 
deleted Paragraphs 3 to 12 are hnked to each other, but 
there are few links connecting them to other nearby para- 
graphs This suggests that these paragraphs deal with 
one topic, and the topic slurs from paragraph 12 to 14 
Thus, paragraphs 3 to 12 form a segment On reading 
the text, we find that they, m fact; deal with the devwes 
and hardware used m telecommumcaUons, and the topic 
slnrs from paragraph 14 to a discussion of the software 
used m telecommumcauons 2 Snmlarly, paragraphs 28 
2paragraph 13 is actually the heading for the Software sec- 
40 
I/ 
I, 
I 
I 
I 
I 
! 
i 
I 
I 
I 
I 
I 
i 
I 
I 
i 
I 
Figure 3 Global bushy and depth-first paths for arUcle 
Telecommumcanons 
to 35 form a segment, and tins segment descnbes the 
pubhc telecommumcatwn ser~tces hke electromc-mad 
Paragraphs 39 and 40 form the last segment ~n standards 
m telecommumcauon For the algorithm used to auto- 
mattcally generate segments for a document, see (Salton 
et al, 1996b, Salton et al, 1996a) 
2.2 Text Traversal 
We now come to the problem of generaung summaries 
by selecting paragraphs of the document for inclusion 
TbJs could be accomphshed by automattcally identifying 
the tmportant paragraphs on the map and traversing the 
selected nodes m text order to construct an extract, or 
path Various criteria maybe used to assocmte importance 
with paragraphs, glwng rise to &fferent paths In thin 
study, we evaluate four types of paths 
Bushy path 
The bushiness of a node on a map ~s defined as the 
number of hnks connecung ~t to other nodes on the map 
Since a Inghly bushy node (paragraph) ~s related to a 
number of other nodes, ~t has an overlapping vocabu- 
lary with many other paragraphs and is hkely to discuss 
topics covered m several paragraphs Such paragraphs 
are good overview paragraphs and are deswable m a sum- 
mary, and therefore are good candxdates for extracuon A 
global bushy path ~s constructed out of the ~ most bushy 
nodes on the map, where n ts the targeted number of 
paragraphs m the summary. These nodes are arranged m 
chronologtcal order, I e, the order m winch they appear 
m the original document, to form the summary 
Depth-first path 
The nodes on a bushy path are connected to a number 
of other paragraphs, but not necessarily to each other 
uon Since heachng paragraphs are not full-text and are not 
avmlable m all dommns, we do not leverage the,r presence m 
our summarization algorithms 
41 
20% Global Bushy Path 
3 Telecomnmmcauom, broadly spealong, the process of ~rammJmng 
mfo~rmatzcau m an electromc fcfm between any two dmqces by using any lund 
of tmmwa~on hue Mine spec~ffimlly, however, telecomnmmcauom refers 
to the process 
Para S The devlces used m teleco~mtm~caUom can be compw.ers terrm- 
nab (devices that ummuut and recmve mfonna~on), and penphm'al eqmpment 
such es lmmen (see Computer, and see Ot~ce Systems) The uansrassion 
hue used 
Para 14 Among the dlffm'ent ktm:ls of sofEware arc terrmnal-emulanon, file- 
~'amfer, host, and network soflwme Terrmnal-emulauon software makes it 
pebble f~ a device to p~'orm d~ same funcUoas as a temuual Ftle-tramfm" 
suflwar¢ m 
Para 16 ~ majc~ antegones of telccommumcanon g~phcauons can be 
chscussed here host-te~nnal, file4ransfer, and computer-netw~'k ¢~mmm- 
callons 
Para 22 In flle-Wansfer conmmmcauons, two de,aces are co~ected eld~er" 
two comptltfl's, two tel~nmalg, or a computef and a temaoa\] Otle ckvlce thell 
u.ans~qruts an ¢nUr~ data cr program file to the other devxce For cxample, a 
person 
Table 1 Text for global bushy path for arucle Telecom- 
muB|ca~loH$ 
Therefore, whde they may prowde comprehensive cov- 
erage of an arucle, they may not form a very coherent ex- 
tract, and the rea~l~blhty of the summary nught be poor 
To avoid tlus problem, we use the following strategy to 
budd depth-first paths start at an important node -- the 
first node or a htghly bushy node are typical choices m 
and WSlt the next most s~mdar node at each step Note 
' that, only the paragraphs that follow the current one m 
text order are can&dates for the next step Smce each 
paragraph is slmdar to the next one on the path, abrupt 
transmous m subJeCt matter should be ehrmnated, and 
the extract should be a coherent one However, since the 
subJeCt matter of the paragraphs on the path is dictated 
to some extent by the contents of the first paragraph, all 
aspects of the arUcle may not be covered by a depth-first 
path (Salton and Smghal, 1995, Salton et al, 1996b) 
Segmented bushy path 
Some arucles contain segments deahng with a spe- 
clahzed topic The paragraphs m such a segment would 
be well connected to each other, but poorly connected 
to other paragraphs A bushy path would not include 
these paragraphs, and would thereby completely exclude 
an aspect of the subJect matter covered in the amcle 
A segmented bushy path attempts to remedy tins prob- 
lem It is obtained by construcUng bushy paths individ- 
ually for each segment and concatenaung them m text 
order At least one paragraph ~s selected from each seg- 
ment The reramnder of the extract is formed by picking 
more bushy nodes from each segment m propomon to Its 
length Since all segments are represented m the extract, 
this algorithm should, m pnnclple, enhance the compre- 
hensweness of the extract (Salton et al, 1996b) 
20% Global Depth-Fwst Path 
3 Telecommumcatsons. broadly spea\]ong, ~.pmcess of ~'aosnutmsg 
mfonnanco in,an electmmc form between any two dewces by using any lund 
of trausrmssmn hne Mine spec~fically, however, telecommumcaUoas refers 
to the.procoss 
Para 7 Each telecommumcauons device uses hardware, which connects a 
devscc to the transnussmn line. and softwm'e, which malay st possible for a 
dewce to transrmt mfonnauon 
Para 14 Among the &fferant lands of software are temuna1-emulauon, file- 
transfer, host, and network software Temenal-emulanon software makes st 
possthle fc~ a devtce to perform the same fmx:bons as a temunal Fde-transf~ 
software ss 
Para 20 Finally, most host computers can conunumcate propedy with 
only one land of terannal To conununtcate w~th such computers, terrmnai- 
emulanon software Is installed on a computer to make the hnkag e succeed 
Para 32 An mformauowremeval se~co leases nine on a host computer to 
unmmals, so that these tcraunals arc able to remeve mfmmauon from the host 
computer An example ss CompuServe lnfccmatton Services TO gmn access 
to 
Table 2 Text for global depth-first path for amcle 
Telecommumcatwns 
Augmented segmented bushy path 
Typ~cally authors introduce a new topic (for example 
a "Section") m the first few paragraphs that d~scuss the 
topsc m the text If proper secUonmg mformatton were 
avadable for all documents, a reasonable summanzauon 
scheme m~ght be to select the first paragraph from each 
SecUon A segmented bushy path m~ght slop the less 
bushy |ntroductory paragraph of a segment m favor of a 
~ ~. o 
• t . . % 
i% 
227A1 f~24" t| ~ "~ ~lq pS 
'i \ 
.... 
22Zii$ T&~WU ~ ~ "~ (~,84Wut12Ul 
Ftgure.4 Segmented bushyand augmented segmented 
bushy paths for artscle Telecommumcattons 
more bushy paragraph whsch ss somewhere m the wuddle 
ofthe segment Th|s is quite delnmental to the readab|hty 
of the summary To remedy th|s problem, we define the 
augmented segmented bushy path which always picks the 
introductory paragraph from a segment, and other bushy 
20% Segmented Bushy Path 
Para$ The de,cos used m telecommem~om can be computer,, tanu- 
nals (dewces that Uanstmt and receive mf0manon), and penphend eqmpmem 
such M printers (see ComImt~r, and see O'fl~ Systems) The transmss.~on 
hanuscd 
Para 14 Among the thfferent lands of software.are temunal-emulauon, file- 
transfer, host. and network software Tcrnunal-cmulalmn software makes st 
possthle for a dewce to perfm'm the same funcuons as a temunal File-transfer 
softwese us 
Para 16 Three major cotegones of telecommumcatton apphcauons can be 
¢hscossed here host-tefananl, file-lransfef, and computer-network commtml- 
ca~oos 
Para 32 An mfonnauowre~eval settee leases ume on a host computer to 
temuanls, so that these tenmnals are able to remeve mfomanon from the host 
computer An example ts CompuSezve lafm-manon Semces To gmn access 
to 
Para 39 Cestam telecommumcanon methods have become standard m the 
telecommumcatmnsmdustry ~ a whole, because if two devices use ¢hffe~nt 
standards they are unable to conunumcate propedy Standards axe developed 
In 
Table 3 Text for segmented bushy path for arttcle 
Telecommumcatmns 
paragraphs based upon the length reqmrements of the 
summary 
Figure 3 shows a 20% global bushy path and a global 
depth-first path constructed for the aracle on telecommu- 
mcauons The corresponding texts for these paths are 
shown m Tables 1 and 2 Note that the bushy path does 
not include any material from the last two segments (on 
telecommumcanon services and standards) The depth 
first path mmses out the segment on standards On the 
other hand, the segmented bushy path (see Figure 4 and 
Table 3) does include a paragraph from each of the last 
two segments and zs more m&cauve of the contents of 
the article than either of the global paths But the seg- 
mented bushy path picks paragraphs from the Imddle of 
a segment, for examPle paragraph 5 m the first segment 
and paragraph 32 m the segment on telecommumcat~ons 
serwces Presenting a paragraph from a topic without 
introducing the topsc m once agmn detrimental to the 
readability of the summary Thts could be fixed by aug- 
mentmg the segmented bushy paths by forcing them to 
select the introductory paragraph from every segment 
The augmented segmented bushy path for this amcle (see 
Table 4) Is actually a very good mchcauve summary for 
the amcle 
3 Experiment 
Several automaUc exwacuon schemes, including the 
above, have been proposed earher (Salton et al, 19961>, 
Salton et al, 1996a) General features of the extracts 
produced by these chfferent algonthrns have been noted, 
based on manually exarmmng some of the extracts How- 
ever, objective evaluauon of these algorithms has always 
been problematsc In (Salton and Smghal, 1995), an 
attempt was made to evaluate the summaries based on 
42 
I/ 
I, 
I 
i 
I 
i 
I 
I 
i 
I 
I 
i 
I 
I 
i 
I 
I 
! 
20% Augmented Segmented Bushy Path 
Pact 3 TeleconunumcaUoas, binary speak.rag, the process of transsmtUng 
mformaUon m an elcctromc form belween any two dewces by uun8 any land 
of tranmusston hne More speczfically, howevor, telecmumummuoe.~ refm 
to the.proceu 
Paral4 Among the chffezent lands of software ere tenmnal=efmdnt~ fllc- 
mmsfer, host, and network software Tmmnal-emulatton softwa0re makes it 
possible for a device to perform the same functsom as a terminal Ftle-tran~er 
Para 16 Tluee major categonas of telecommumcatton apphca',oas can be 
discussed here host-tmmanl, file-transfer, and computer-netwark connnum- 
cauons 
Para 28 Pubhc telccommumcauon sorwces are a relatively recant devel- 
opment to telecommumcattons The four kinds of services me netwerk. 
mformatmn-remeval, electromc-mmi, and bulletin-board services 
Para 39 Cortmo telecommumcauon methods have become s~nd~.d to the 
telccommumca~om mdusUy as a whole because ff two dewces use different 
standards they arc unable to conmmmcato properly Standards me developed 
m 
Table 4 Text for augmented segmented bushy path for 
article Telecommumcatwns 
ranked retrieval Since relevance judgments were not 
avadable for passages or extracts, the avadable relevance 
judgments for full documents were extrapolated to the 
extracts However, the pomon of a document that is rele- 
vant to a query may well get left out of a passage, and so, 
results obtmned from such an evaluauon are unrehable 
Since the goal of our summarization schemes ~s to 
automate a process that has trachUonally been done man- 
ually, a comparison of automaucally generated extracts 
with those produced by humans would prowde a rea- 
sonable evaluauon of these methods We assume that 
a human would be able to identify the most Important 
• paragraphs In an amcle most effectwely If the set of 
paragraphs selected by an automatic extraction method 
has a Ingh overlap with the human-generated extract, the 
automauc method should be regarded as effective Thus, 
our evaluation method takes the following form a user 
submits a document to the system for summanzatson, in 
one case, the system presents a summary generated by 
another person, m the other, It produces an automatically 
generated extract The user compares the two summaries 
manual and automauc -- to Ins/her own notion of 
an ideal extract To evaluate the automatic methods, we 
compare the user's 'sausfacuon' m the two cases Such 
an evaluation methodology has its shurtcormngs, for ex- 
ample It does not account for the readabihty aspect of a 
summary, It also ignores the fact that user satisfaction 
Is related to whether a user has seen the full-arucle or 
not Unfortunately, given the lack of a good testhed for 
evaluaung automatic summarization, xt ts the best we can 
do 
Fifty articles were selected from the Funk and Wag- 
nallsEncyclopedla(PunkandWagnalls, 1979) Foreach 
arucle, two extracts were constructed manually One of 
these extracts was used as the manual summary The 
otherone, winch then becomes a user's 0deal) smnmary, 
is used as the oracle to compare the performance of the 
manual summary and an automatic summary .The fol- 
lowing instructions were given to those who constructed 
the manual extracts 
Please read through the articles Determine 
-, wh!ch n paragraphs are the most tmportant for 
summarizing tins amcle n = MAX(5, l/5th 
the total number of paragraphs (round to the 
next Ingher number for fracuons)) Mark the 
paragraphs winch you chose 
The resulting database of 100 manual summanes (two for 
each of the fifty arUcles) was used m the final evaluation 
of the automaUc methods Summaries were then auto- 
matically generated for the amcles, using each of the four 
methods descnbed above In each caseJhe automauc and 
manual extracts had the same number of paragraphs 3 
In manual summarization by paragraph extraction, 
there are certam paragraphs m a text that certainly belong 
m a summary extract, but then there are many paragraphs 
whose importance is subjectively judged by the mChvld- 
ual doing the extraction To reduce the effect of the ar- 
bltranness introduced by mchvldual's subjective notions, 
for very short arUcles, we asked our subjects to extract. 
at least five paragraphs, hoping that the mtersecuon of 
the two manual summaries roll indeed yield the most 
important paragraphs m an artscle The articles used m 
our evaluation had anywhere between thn'teen and forty 
eight content paragraphs The current implementation of 
the Smart system also considers the section headings, etc 
as |ndlvldual paragraphs Such paragraphs were marked 
as non-content and were ~gnored m the summanzatmn 
process 
4 Results and Discussion 
The following scenario was assumed for evaluauon of 
the automatic summaries 
• A user walks up to the system and presents an article 
for summanzauon 
• In the first case, the system asks another human to do 
the summanzaUon and presents it to the user The 
user compares tins summary to lus/her own nouon 
of an ideal summary 
• In the second case, the system automatically gener~ 
ates a summary and returns it to the user The user 
agmn compares this summary to his/her own nouon 
of an Ideal summary 
• The user saUsfaction m the above two cases Is mea- 
sured by the "degree of overlap" between the sum- 
mary presented by the system and the user's nouon 
of an ideal summary 
3Different users could count paragraphs &fferently Thus, 
for a few amcles, the lengths of the two manually generated 
summaries were different In such cases, the autoraaUc proce- 
dures took the average of these two lengths as the target length 
for the extract 
43 
If the user's sausfactton as about the same an the above 
two cases, then our automatic summanzataon schemes 
are summanzang as well as a human would summanze 
by paragraph extracUon 
For each automaue summanzaUon algonthm, four 
quanUties were computed 
10pwmi.c evaluatmn Smce the two manual ex- 
tracts for an amcle are chfferent, the amount of 
overlap between an automatic and a manual ex- 
tract depends on which manual extract as selected 
for comparison The opUmisuc evaluauon for an 
algonthm as done by selecung the manual extract 
wath winch the automatic extract has a Ingher over- 
lap,.and measuring tins overlap Tins as the same as 
using the human whose noUon of an ~deal extract ~s 
closer to the automauc extract as our user 
2 Pesszrmstw evaluanon Analogously, a pessarmstic 
evaluation ~s done by Selecting the manual extract 
with winch the automauc extract has a lower over- 
lap Thas as the same as using the human whose. 
notion of an adeal extract ~s more chss~rmlar to the 
automatac extract as our user Tins, in some sense. 
as the worst case scenario 
3 Intersection For each arucle, an antersecuon of 
the two manually generated summaries as computed 
The fact that the paragraphs an tins intersection were 
deemed amportant by both the readers suggests that 
they may, an fact, be the most irnportant paragraphs 
m the arucle We compute the percentage of these 
paragraphs that ~s included an the automaue extract 
4 Umon We also calculate the percentage of auto- 
maucally selected paragraphs that as selected by at 
least one of the two users Tins as, an some sense, a 
precasmn measure, since at provades us wah a sense 
of how often an automatically selected paragraph ~s 
potenually amportant 
In our experimentation, we observed that many sub- 
jects tend to select paragraph 3 an the summaries Tins 
as because tins paragraph is the first content paragraph an 
an amcle and tends to be a chctionary-style defimtion for 
the amcle For example, for arUcle 15930 (Monopoly), 
tins paragraph reads 
Monopoly, economic snuauon in winch there as 
only a single seller or producer of a eommochty 
or a servace For a monopoly to be effective, 
there must be no prazucal subsututes for the 
product or servace sold, and no serious threat 
of the ency of a competitor into the market 
'Flus enables the seller to control the pnce 
Such &cUonary-style definmons are generally hked by 
readers and thus are usually included m a summary by 
our subjects 
In general, an written texts, the first content paragraph 
tends to be an introductory paragraph and ~s a good start- 
mg paragraph for summargauon For the encyclopecha 
amcles, we use tins reformation and we always include 
paragraph 3 m the bushy and the depth-first summaries 
"Flus paragraph might be nussed by the segmented bushy 
paths but Is recaptured by the augmented segmented 
bushy paths In case such collection specific mfonna- 
uon as not avadable, we use the first paragraph wath a 
• reasonable number of hnks to the rest of the paragraphs 
as the mtreductory paragraph (Salton and Smghal, 1995) 
Table 5 shows the overlap for the two manual extracts, 
and the dafferent evaluation measures averaged over all 
fifty amcles, for the bushy, depth-first, segmented bushy, 
and augmented segmented bushy extracts In adchtion 
to using these four methods, extracts were also gener- 
ated for the amcles by selecting the reqmred number of 
paragraphs at random To ehnunate any advantage that 
the bushy, depth-first, and augmented segmented bushy 
extracts might have due to the presence of the introduc- 
tory paragraph, paragraph 3 is always included m the 
• random paths The eValuauon results for these random 
extracts are also shown m the table Random selection 
of paragraphs serves as the weakest possible basehne If 
an algorithm does not perform noticeably better than a 
random extract, then at as certmnly doing a poor job of 
summanzauon Also, Brandow, Matze, and Rau found 
m (Brandow et al, 1995) that simply selecting the first 
few sentences (the lead sentences) produced the most 
acceptable summanes To test their findmgs m our enva- 
ronment, we also Selected the first 20% paragraphs of an 
arucle and used n as yet another automauc summary 
Manual Extracts 
The most unexpected result of our experiment was the 
low level of agreement between the two human subjects 
The overlap between the two manual extracts as only 
46% on an average, z e, an extract generated by one 
person is hkely to cover 46% of the mformatmn that as 
regarded as most tmportant by anotherperson This ratto 
suggests that two humans dasagree on more than half the 
paragraphs that they consider to be critical In addmon, as 
re&cared above, the first paragraph of these encyclopedm 
arucles ~s a general introduction to the amcle and ~s often 
selected by both subjects-- m 50% of the cases m w~ch 
the mtersection between the two users' extracts ~s a single 
paragraph, tins paragraph as the first one Tins increases 
the chances of overlap between the two manual extracts 
If we exclude tins specml paragraph from the arUcle, the 
overlap figures for two humans wall be even worse " 
The lack of consensus between users on winch para- 
graphs are miportant can be explained as follows On a 
first reading, users earmarked ceruun paragraphs as am- 
portant Some of these paragraphs were then einmnated, 
m order to reduce the extract to the stipulated size Of. 
ten, the choace between winch paragraphs to keep and 
winch to exclude was a &flicult one, and m such satua- 
uons, some arbm'armess ts bound to creep m Tins facts 
casts some shadows on the utahty of automatac text sum- 
manzauon by text extraction It ~s possable that the user 
satisfactaon maght be Ingher m reabty when the true user 
does not read the poruon of an amcle not presented to 
Into/her by the sumraanzation system and does not get an 
opportumty to form has/her own adeal vaew of an extract 
44 
I 
i 
I 
| 
! 
i 
I 
I 
I 
! 
/ 
I 
! 
I 
i, 
I 
! 
! 
! 
Overlap between manual extracts: 46% 
Alg~thm 
Global bushy 
Global depth-first 
Segmented bushy 
Augmented seg bushy 
Random Inlual'O:ead) 
Opum~suc (%) 
45 60 
43 98 
45 48 
46 66 
39 16 
47 99 
Pesslmisttc (%) 
• 3074 
27 76 
26 37 
27 59 
22 07 
29 50 
IntersecUon (%) 
47 33 
42 33 
38 17 
41 83 
38 47 
5000 
Union (%) 
55 16 
52 48 
52 95 
55 44 
44 24 
55 97 
Table 5 Evaluatton measures for automatic extractton methods 
Automatic Extracts 
Table 5 mdtcates that global bushy paths and aug- 
mented segmented bushy paths produce the best extracts 
among the four paths considered m flus study 55% of 
the paragraphs selected by the process were considered 
important by at least one user OptmusUcally spealang, 
a global bushy or an augmented segmented bushy path 
may be expected to agree approximately 46% with a user 
Tlus number is at par with the agreement between two 
humans (45 81%) This result is reassunng m terms of 
• the method's viablhty for generating good extracts, since 
the scheme performs as well as a human 
About 47% of the paragraphs deemed important by 
both users are included m the bushy extract for an amcle 
This figure ms somewhat dlsbeartemng We expected a 
better coverage of these vital paragraphs by our extracts 
A further study of these paragraphs nught reveal some 
propemes that users look for in a paragraph to decide 
its importance It might then be possible to automate 
this selection process We also ldenufied the arucles 
for which the intersection of the two user summaries 
is a single paragraph For 78% of these amcles, this 
paragraph was included in the bushy path 
Segmented bushy paths perform worse than expected 
Tills Is because the first paragraph of an artacle Is very 
often selected by users, and segmented bushy paths oc- 
casronally omit flus paragraaph Tins results m a decrease 
m the overlap between automauc and manual extracts 
In contrast, the other paths are guaranteed to include 
the first paragraph, and perform better But, in general, 
the performance of segmented bushy paths was saus- 
factory (45 48% overlap with the user in the opunusuc 
method) Smularly, the performance of the depth-first 
path was also sausfactory All paths aclueved the tram- 
mum reqtmement of perfonmng significantly better than 
a random extract 
But more lnteresUngly, we observe that extracts pro- 
duced by selecting the first few paragraphs of the amcles 
also performed comparably to the best paragraph extrac- 
Uon scheme Adrmttedly, our evaluation methodology 
lacks the evaluaUon of the readabdlty aspect of a sum- 
mary wluch was one of the mmn mouvauons of moving 
from a sentence-based extracuon strategy to paragraph- 
based extracUon With very high chances, the lead sum- 
mary roll outperform all other automauc summaries m 
terms of readabthty We beheve this because automauc 
summaries are a forced concatenauon of paragraphs &s- 
tnbuted all across a document, whereas a lead summary 
Is a mcely coherent sequence of paragraphs, as wntten 
by the author Overall, the lead summanes are compa- 
rable to the best summanzauon strategy and could be 
more readable than allother summaries Tlus troth is 
rather discouraging for the feasthlhty of automauc sum- 
manzauon by text extracUon but agrees wlth the obser- 
vauons m (Brandow et al, 1995) News reports, used 
m (Brandow et al, 1995), frequently contmn a leading 
paragraph that summanzes the story contmned in the rest 
of the report Likewise, m the encyclope&a amcles used 
m flus study, the first paragraphs usually define the topic, 
and provide a general outline about It 
To sum up 
• The goocl news is that Interpreted m light of the fact 
that the overlap between the two manual extracts is, 
on an average, 46%, and given the enormous reduc- 
uon m the amount of resources reqmred 4, our results 
indicate that automauc methods for extracuon com- 
pare very favorably with manual extracUon 
• But the bad news ms that a summary formed by 
extracung the mltmi paragraphs of an arUcle IS as 
good as the best automatic summary and might just 
be more readable from a user's perspective Tins 
bnngs into question the overall uUhty of automatic 
text summanzatmn by text (sentence or paragraph) 
extracUon - 
It ms possible that the nature of the articles used m 
thls study (encyclope&a amcles) and m (Brandow et al, 
1995) (news articles) have a structure that yields very 
good summanes, stmply by extracting the initial part of 
an ar~cle It wdl be interesting to see lfobservattons from 
flus study and from (Brand0w et al, 1995) carry over to 
other, more non-encyclopedia like and non-news like do- 
mmns (for example legal documents or U S Patents) 
In our stu&es with text summanzauon (by text extrac- 
uon), we have always felt a very strong need for a good 
evaluauon test-bed Lack of good objecuve evaluauon 
techniques for text summanzauon has always been the 
biggest problem in all our work, an~ has consistently 
4The system took about 15 nunutes to generate 3 summaries 
for each of 50 amcles A human would reqmre about 10 nunutes 
to produce a summary for a typical amcle from tlus set 
45 
&scouraged more expenmentaUon and exploration of m- 
teresung research posslbihlaes 0tke the one menuoned 
above regardmg amcles from other domains) 
5 Conclusion 
In tlus study, we have tried to evaluate automattc sum- 
manzaUon methods proposed earlier If a good testbed 
for evaluaUng summaries were available, the evalualaon 
methodology adopted m this study could be improved, 
but we believe it is the best we can currently do Under. 
our evaluation scheme, the four extraclaon algorithms 
exanuned perform comparably, but they produced sigmf- 
lcantly better extracts than a random selection of para- 
graphs The absolute performance figures are not/ugh, 
but given the low overlap between two human-generated 
extracts, they are enunenfly satisfactory However, this 
wide vanatton between users brings us to the question of 
whether summanzauon by automauc extracuon is feasi- 
ble If humans are unable to agree on wluch paragraphs 
best represent an amcle, it is unreasonable to expect an 
automauc procedure to identify the best extract, what- 
ever that might be We also find that presenting the user 
with the lmUal part of an arucle is as good as emploYing 
any "mtelhgent" text extraction scheme In summary, 
automauc summanzauon by extractuon is admtttedly an 
imperfect method However, at the moment, it does ap- 
pear to be the only domain-independent technique which 
performs reasonably 
Acknowledgments 
We are deeply indebted to (late) Professor Gerard Salton 
for all hts guidance during the lineal stages of this work 
Without the invaluable advice and support of Professor 
Salton, this work would not have been possible 
• We thank Nawaaz Ahmed, David Fielding, Nicholas 
Howe, S Ravtkumar, Cyntlua Robinson, and Dlvakar 
Vlshwanath for generaung extracts for the arucles used 
m the evaluaUon process 

References 
R Brandow, K Mltze, and L F Rau, Automauc Conden- 
sation of Electromc PubhcaUons by Sentence Selec- 
uon, Informatwn Processing and Management, 31 (5), 
675-685, 1995 
L Earl, Experiments m Automauc Extracung and Index- 
" lng, Informatwn Storage and Retrieval, 6 4, 313-334, 
October 1970 
Funk and Wagnalls New Encyclopecha, Funk and Wag- 
nails, New York, 1979 
M A Hearst and C Plaunt, Subtopic Structunng for 
Full-Length Document Access, Proc SIGIR '93, 59- 
68, Assoctauon for Computing Machinery, New York, 
November 1993 
J Kuplec, J Pedersen, and F Chen, A Trmnable Docu-. 
ment Summarizer, Proc SIGIR '95, 68-73, Assocla- 
• non for Computing Machinery, New York, July 1995 • 
H P Luhn, The A'utomaUc CreaUon of Lltemtare Ab- 
• stracts, IBM Journal of Research and Development, 
2(2), 159-165, 1958 
C D Pmce, Constructmg Literature Abstracts by Com- 
puter TechniquesandProspects,lnformatwnProcess- 
mg and Management, 26(1), 171-186, 1990 
G Salton, e d , The SMART Remeval System -- Exper- 
tments m Automatw Document Processing, Prenttce 
Hall Inc, NL 1971 
G Salton and M McGdi lntroductton to Modern Infor- 
maaon Retrwval McGraw Hall•Book Co, New York, 
1983 
G Salton Automatw text processing--the transforma- 
twn, analysts and retr~eval of mformaaon by Computer 
Addison-Wesley Pubhshmg Co, Reading, MA, 1989 
G Salton and J Allan, Selecuve Text Uuhzatwn and 
Text Traversal, Proc Hypertext-93, Assoczatzon for 
Computing Machinery, New York, November 1993, 
131-144 
G Salton, J Allan, C Buckley, and A Smghal, Auto- 
matic Analysis, Theme General~on and Summarization 
of Maclune-Readable Texts, Science 264, 1421-1426, 
3 June 1994 
G Salton, J Allan, and A Smghal, AutomaUc Text De- 
composmon and Strucmnng, Informatwn Processing 
and Management, 32(2), 127-138, 1996 
G Salton and A Smghal, Selective Text Traversal, Tech- 
mcal Report, TR 95-1549, Department of Computer 
Science, Cornell University, Ithaca, NY, September, 
1995 
G Salton~ A Smghal, C Buckley, and M Mltra, Auto- 
mauc Text Decompositmn Using Text Segments and 
Text Themes Hypertext '96, The Seventh ACM Con- 
ference on Hypertext, Associaraon for CompuUng Ma- 
ctunery, New York, 53-65 
G Salton, A Smghal, M Mllra, and C Buckley, Au- 
tomatlc Text Structunng and Summanzatton, Infor- 
marion Pwcessmg and Management, 33(2), 193-208, 
1997 
