A Text-Extraction Based Summarizer 
Tomek Strzalkowski, Gees C. Stein and G. 
GE Corporate Research & Development 
1 Research Circle 
Niskayuna, NY 12309, USA 
Bowden Wise 
Abstract 
We present an automated method of generating 
human-readable summaries from a variety of text 
documents including newspaper articles, business 
reports, government documents, even broadcast 
news transcripts. Our approach exploits an em- 
pirical observation that much of the written text 
display certain regularities of organization and 
style, which we call the Discourse Macro Structure 
(DMS). A summary is therefore created to reflect 
the components of a given DMS. In order to pro- 
duce a coherent and readable summary we select 
continuous, well-formed passages from the source 
document and assemble them into a mini-document 
within a DMS template. In this paper we describe 
an automated summarizer that can generate both 
short indicative abstracts, useful for quick scanning 
of a list of documents, as well as longer informative 
digests that can serve as surrogates for the full text. 
The summarizer can assist the users of an informa- 
tion retrieval system in assessing the quality of the 
results returned from a search, preparing reports 
and memos for their customers, and even building 
more effective search queries. 
Introduction 
A good summarization tool can be of enormous help 
for those who have to process large amounts of doc- 
uments. In information retrieval one would bene- 
fit greatly from having content-indicative quick-read 
summaries supplied along with the titles returned 
from search. Similarly, application areas like rout- 
ing, news on demand, market intelligence and topic 
tracking would benefit from a good summarization 
tool. 
Perhaps the most difficult problem in designing 
an automatic text summarization is to define what 
a summary is, and how to tell a summary from a 
non-summary, or a good summary from a bad one. 
The answer depends in part upon who the summary 
is intended for, and in part upon what it is meant to 
achieve, which in large measure precludes any objec- 
tive evaluation. A good summary should at least be 
a good reflection of the original document while be- 
ing considerably shorter than the original thus sav- 
ing the reader valuable reading time. 
In this paper we describe an automatic way to 
generate summaries from text-only documents. The 
summarizer we developed can create general and 
topical indicative summaries, and also topical in- 
formative summaries. Our approach is domain- 
independent and takes advantage of certain organi- 
zation regularities that were observed in news-type 
documents. The system participated in a third- 
party evaluation program and turned out to be one 
of the top-performing summarizers. Especially the 
quality/length ratio was very good since our sum- 
maries tend to be very short (10% of the original 
length). 
The summarizer is still undergoing improvement 
and expansion in order to be able to summarize a 
wide variety of documents. It is also used success- 
fully as a tool to solve different problems, like infor- 
mation retrieval and topic tracking. 
Task Description and Related Work 
For most of us, a summary is a brief synopsis of the 
content of a larger document, an abstract recount- 
ing the main points while suppressing most details. 
One purpose of having a summary is to quickly learn 
some facts, and decide what you want to do with the 
entire story. Depending on how they are meant to be 
used one can distinguish between two kinds of sum- 
maries. Indicative summaries are not a replacement 
for the original text but are meant to be a good re- 
flection of the kind of information that can be found 
in the original document. Informative summaries 
can be used as a replacement of the original docu- 
ment and should contain the main facts of the docu- 
ment. Independent of their usage summaries can be 
classified as general summaries or topical summaries. 
A general summary addresses the main points of the 
document ignoring unrelated issues. A topical sum- 
223 
mary will report the main issues relevant to a certain 
topic, which might have little to do with the main 
topic of the document. Both summaries might give 
very different impressions of the same document. In 
this paper we describe a summarizer that summa- 
rizes one document, text only, at a time. It is capa- 
ble of producing both topical and generic indicative 
summaries, and topical informative summaries. 
Our early inspiration, and a benchmark, have 
been the Quick Read Summaries, posted daily off 
the front page of New York Times on-line edition 
(http://www.nytimes.com). These summaries, pro- 
duced manually by NYT staff, are assembled out of 
passages, sentences, and sometimes sentence frag- 
ments taken from the main article with very few, if 
any, editorial adjustments. The effect is a collection 
of perfectly coherent tidbits of news: the who, the 
what, and when, but perhaps not why. Indeed, these 
summaries leave out most of the details, and cannot 
serve as surrogates for the full article. Yet, they M- 
low the reader to learn some basic facts, and then to 
choose which stories to open. 
This kind of summarization, where appropriate 
passages are extracted from the original text, is very 
efficient, and arguably effective, because it doesn't 
require generation of any new text, and thus low- 
ers the risk of misinterpretation. It is also relatively 
easier to automate, because we only need to identify 
the suitable passages among the other text, a task 
that can be accomplished via shallow NLP and sta- 
tistical techniques. Nonetheless, there are a num- 
ber of serious problems to overcome before an ac- 
ceptable quality summarizer can be built. For one, 
quantitative methods alone are generally too weak 
to deal adequately with the complexities of natural 
language text. For example, one popular approach 
to automated abstract generation has been to select 
key sentences from the original text using statisti- 
cal and linguistic cues, perform some cosmetic ad- 
justments in order to restore cohesiveness, and then 
output the result as a single passage, e.g., (Luhn 
1958) (Paice 1990) (Brandow, Mitze, & Rau 1995) 
(Kupiec, Pedersen, & Chen 1995). The main advan- 
tage of this approach is that it can be applied to 
almost any kind of text. The main problem is that 
it hardly ever produces an intelligible summary: the 
resulting passage often lacks coherence, is hard to 
understand, sometimes misleading, and may be just 
plain incomprehensible. In fact, some studies show 
(cf. (Brandow, Mitze, & Ran 1995)) that simply se- 
lecting the first paragraph from a document tends 
to produce better summaries than a sentence-based 
algorithm. 
A far more difficult, but arguably more "human- 
like" method to summarize text (with the possi- 
ble exception of editorial staff of some well-known 
dailies) is to comprehend it in its entirety, and then 
write a summary "in your own words." What this 
amounts to, computationally, is a full linguistic anal- 
ysis to extract key text components from which a 
summary could be built. One previously explored 
approach, e.g., (Ono, Sumita, & Miike 1994) (McK- 
eown & Radev 1995), was to extract discourse struc- 
ture elements and then generate the summary within 
this structure. In another approach, e.g., (DeJong 
1982) (Lehnert 1981) pre-defined summary tem- 
plates were filled with text elements obtained using 
information extraction techniques. Marcu (Marcu 
1997a) uses rhetorical structure analysis to guide the 
selection of text segments for the summary; simi- 
larly Teufel and Moens (Teufel & Moens 1997) ana- 
lyze argumentative structure of discourse to extract 
appropriate sentences. While these approaches can 
produce very good results, they are yet to be demon- 
strated in a practical system applied to a reasonable 
size domain. The main difficulty is the lack of an effi- 
cient and reliable method of computing the required 
discourse structure. 
Our Approach 
The approach we adopted in our work falls some- 
where between simple sentence extraction and text- 
understanding, although philosophically we are 
closer to NYT cut-and-paste editors. We overcome 
the shortcomings of sentence-based summarization 
by working on paragraph level instead. Our sum- 
marizer is based on taking advantage of paragraph 
segmentation and the underlying Discourse Macro 
Structure of News texts. Both will be discussed be- 
low. 
Paragraphs 
Paragraphs are generally self-contained units, more 
so than single sentences, they usually address a sin- 
gle thought or issue, and their relationships with 
the surrounding text are somewhat easier to trace. 
This notion has been explored by Cornell's group 
(Salton et al. 1994) to design a summarizer that 
traces inter-paragraph relationships and selects the 
"best connected" paragraphs for the summary. Like 
in Cornell's system, our summaries are made up of 
paragraphs taken out of the original text. In addi- 
tion, in order to obtain more coherent summaries, 
we impose some fundamental discourse constraints 
on the generation process, but avoid a full discourse 
analysis. 
We would like to note at this point that the sum- 
marization algorithm, as described in detail later, 
224 
does not explicitly depend on nor indeed require in- 
put text that is pre-segmented into paragraphs. In 
general, any length passages can be used, although 
this choice will impact the complexity of the solu- 
tion. Lifting well-defined paragraphs from a docu- 
ment and then recombining them into a summary 
is relatively more straightforward than recombining 
other text units. For texts where there is no struc- 
ture at all, as in a closed-captioned stream in broad- 
cast television, there are several ways to create arti- 
ficial segments. The simplest would be to use fixed 
word-count passages. Or, content-based segmen- 
tation techniques may be applicable, e.g., Hearst's 
Text-Tiling (Hearst 1997). 
On the other hand, we may argue that essentially 
any length segments of text can be used so long 
as one could figure out a way to reconnect them 
into paragraph-like passages even if their boundaries 
were somewhat off. This is actually not unlike deal- 
ing with the texts with very fine grained paragraphs, 
as is often the case with news-wire articles. For 
such texts, in order to obtain an appropriate level of 
chunking, some paragraphs need to be reconnected 
into longer passages. This may be achieved by track- 
ing co-references and other text cohesiveness devices, 
and their choice will depend upon the initial segmen- 
tation we work up from. 
Discourse Macro Structure of a Text 
It has been observed, eg., (Rino & Scott 1994), 
(Weissberg & Buker 1990), that certain types of 
texts, such as news articles, technical reports, re- 
search papers, etc., conform to a set of style 
and organization constraints, called the Discourse 
Macro Structure (DMS) which help the author to 
achieve a desired communication effect. For in- 
stance, both physics papers and abstracts align 
closely with the Introduction-Methodology-Results- 
Discussion-Conclusion macro structure. It is likely 
that other scientific and technical texts will also con- 
form to this or similar structure, since this is exactly 
the structure suggested in technical writing guide- 
books, e.g. (Weissberg & Buker 1990). One obser- 
vation to make here is that perhaps a proper sum- 
mary or an abstract should reflect the DMS of the 
original document. On the other hand, we need to 
note that a summary can be given a different DMS, 
and this choice would reflect our interpretation of 
the original text. A scientific paper, for example, 
can be treated as a piece of news, and serve as a 
basis of an un-scientific summary. 
News reports tend to be built hierarchically out 
of components which fall roughly into one of the 
two categories: the What-Is- The-News category, and 
the optional Background category. The Background, 
if present, supplies the context necessary to under- 
stand the central story, or to make a follow-up story 
self-contained. The Background section is optional: 
when the background is common knowledge or is im- 
plied in the main news section, it can, and usually 
is omitted. The What-Is-The-News section covers 
the new developments and the new facts that make 
the news. This organization is often reflected in the 
summary, as illustrated in the example below from 
NYT 10/15/97, where the highlighted portion pro- 
vides the background for the main news: 
SPIES JUST WOULDN'T COME IN FROM COLD 
WAR, FILES SHOW 
Terry Squillacote was a Pentagon lawyer 
who hated her job. Kurt Stand was a union 
leader with an aging beatnik's slouch. Jim Clark 
was a lonely private investigator. \[A 200-page 
affidavit filed last week by\] the Federal Bureau 
of Investigation says the three were out-of-work 
spies for East Germany. And after that state 
withered away, it says, they desperately reached 
out for anyone who might want them as secret 
agents. 
In this example, the two passages are non- 
consecutive paragraphs in the original text; the 
string in the square brackets at the opening of the 
second passage has been omitted in the summary. 
Here the human summarizer's actions appear rela- 
tively straightforward, and it would not be difficult 
to propose an algorithmic method to do the same. 
This may go as follows: 
1. Choose a DMS template for the summary; e.g., 
Background+News. 
Select appropriate passages from the originM text 
and fill the DMS template. 
Assemble the summary in the desired order; delete 
extraneous words. 
It is worth noting here that the background- 
context passage is critical for understanding of this 
summary, but as such provides essentially no rele- 
vant information except for the names of the people 
involved. Incidentally, this is precisely the informa- 
tion required to make the summary self-contained, if 
for no other reason than to supply the antecedents to 
the anaphors in the main passage (the three, they). 
The Algorithm 
The summarizer can work in two modes: generic 
and topical. In the generic mode, it simply sum- 
marizes the main points of the original document. 
. 
. 
225 
In the topical mode, it takes a user supplied state- 
ment of interest, a topic, and derives a summary 
related to this topic. A topical summary is thus 
usually different from the generic summary of the 
same document. The summarizer can produce both 
indicative and informative summaries. An indica- 
tive summary, typically 5-10% of the original text, 
is when there is just enough material retained from 
the original document to indicate its content. An 
informative summary, on the other hand, typically 
20-30% of the text, retains all the relevant facts that 
a user may need from the original document, that is, 
it serves as a condensed surrogate, a digest. 
The process of assembling DMS components into 
a summary depends upon the complexity of the dis- 
course structure itself. For news or even for scientific 
texts, it may be just a matter of concatenating com- 
ponents together with a little of "cohesiveness glue", 
which may include deleting some obstructing sen- 
tences, expanding acronyms, adjusting verb forms, 
etc. In a highly specialized domain (e.g., court rul- 
ings) the final assembly may be guided by a very 
detailed pattern or a script that conforms to specific 
style and content requirements. 
Below we present a 10-step algorithm for gener- 
ating summaries of news-like texts. This is the al- 
gorithm underlying our current summarizer. The 
reader may notice that there is no explicit provi- 
sion for dealing with DMS structures here. Indeed, 
the basic Background+News summary pattern has 
been tightly integrated into the passage selection 
and weighting process. This obviously streamlines 
the summarization process, but it also reflects the 
notion that news-style summarization is in many 
ways basic and subsumes other more complex sum- 
marization requirements. 
THE GENERALIZED SUMMARIZATION ALGORITHM 
sO: Segment text into passages. Use any available 
handles, including indentation, SGML, empty 
lines, sentence ends, etc. If no paragraph or 
sentence structure is available, use approximately 
equal size chunks. 
sl: Build a paragraph-search query out of the content 
words, phrases and other terms found in the title, 
a user-supplied topic description (if available), as 
well as the terms occurring frequently in the text. 
s2: Reconnect adjacent passages that display strong 
cohesiveness by one-way background links, using 
handles such as outgoing anaphors and other 
backward references. A background link from pas- 
sage N÷I to passage Nmeans that if passage N+I 
s3: 
s4: 
s5: 
s6: 
is selected for a summary, passage N must also be 
selected. Link consecutive passages until all refer- 
ences are covered. 
Score all passages, including the linked groups 
with respect to the paragraph-search query. As- 
sign a point for each co-occurring term. The goal 
is to maximize the overlap, so multiple occur- 
rences of the same term do not increase the score. 
Normalize passage scores by their length, taking 
into account the desired target length of the sum- 
mary. The goal is to keep summary length as close 
to the target length as possible. The weighting 
formula is designed so that small deviations from 
the target length are acceptable, but large devia- 
tions will rapidly decrease the passage score. The 
exact formulation of this scheme depends upon 
the desired tradeoff between summary length and 
content. The following is the basic formula for 
scoring passage P of length I against the passage- 
search query Q and the target summary length of 
t, as used in current version of our summarizer: 
NormScore(P, Q) = RawScore(P, Q) 
+1 
where: 
RawSeore(P, Q) = Z weight(q, P) + prem(P) 
qEQ 
with sum over unique content terms q, and 
1 ifqEP weight(q,P) 
= 0 otherwise 
with prem(P) as a cummulative non-content 
based score premium (cf s7). 
Discard all passages with length in excess of 1.5 
times the target length. This reduces the num- 
ber of passage combinations the summarizer has 
to consider, thus improving its efficiency. The de- 
cision whether to use this condition depends upon 
our tolerance to length variability. In extreme 
cases, to prevent obtaining empty summaries, the 
summarizer will default to the first paragraph of 
the original text. 
Combine passages into groups of 2 or more based 
on their content, composition and length. The 
goal is to maximize the score, while keeping the 
length as close to the target length as possible. 
226 
Any combination of passages is allowed, includ- (5) 
ing non-consecutive passages, although the origi- 
nal ordering of passages is retained. If a passage (6) 
attached to another through a background link is 
included into a group, the other passage must also 
be included, and this rule is applied recursively. 
We need to note that the background links work 
only one way: a passage which is a background for (7) 
another passage, may stand on its own if selected 
into a candidate summary. 
S7: Recalculate scores for all newly created groups. 
This is necessary, and cannot be obtained as a 
sum of scores because of possible term repetitions. 
Again, discard any passage groups longer than 1.5 
times the target length. Add premium scores to 
groups based on the inverse degree of text dis- 
continuity measured as a total amount of elided 
text material between the passages within a group. 
Add other premiums as applicable. 
s8: Rank passage groups by score. All groups become 
candidate summaries. 
s9: Repeat steps s6 through s8 until there is no 
change in top-scoring passage group through 2 
consecutive iterations. Select the top scoring pas- 
sage or passage group as the final summary. 
Implementation and some Examples 
The summarizer has been implemented in C++ with 
a Java interface as a demonstration system, primar- 
ily for news summarization. At this time it can run 
in both batch and interactive modes under Solaris, 
and it can also be accessed via Web using a Java 
compatible browser. Below, we present a few exam- 
ple summaries. For an easy orientation paragraphs 
are numbered in order they appear in the original 
text. 
TITLE: Mrs. Clinton Says U.S. Needs 'Ways That 
Value Families' 
SUMMARY TYPE: indicative 
TARGET LENGTH: 5% 
TOPIC: none 
(6) The United States, Mrs. Clinton said, must become "a nation 
that doesn't just talk about family values but acts in ways that 
values families." 
SUMMARY TYPE: indicative 
TARGET LENGTH: 15% 
TOPIC: Hidden cameras used in news reporting 
(4) Roone Arledge, the president of ABC News, defended the 
methods used to report the segment and said ABC would ap- 
peal the verdict. 
"They could never contest the truth" of the broadcast, Arledge 
said. "These people were doing awful things in these stores." 
Wednesday's verdict was only the second time punitive dam- 
ages had been meted out by a jury in a hidden-camera case. It 
was the first time punitive damages had been awarded against 
producers of such a segment, said Neville L. Johnson, a lawyer 
in Los Angeles who has filed numerous hidden-camera cases 
against the major networks. 
Many journalists argue that hidden cameras and other under- 
cover reporting techniques have long been necessary tools for 
exposing vital issues of public policy and health. But many 
media experts say television producers have overused them in 
recent years in a push to create splashy shows and bolster rat- 
ings. The jurors, those experts added, may have been lashing 
out at what they perceived as undisciplined and overly aggres- 
sive news organizations. 
TITLE: U.S. Buyer of Russian Uranium Said to Put 
Profits Before Security 
SUMMARY TYPE: informative 
TARGET LENGTH: 25~ 
TOPIC: nuclear nonproliferation 
(:) 
(2) 
(7) 
(8) 
(19) 
(2o) 
In a postscript to the Cold War, the American government- 
owned corporation that is charged with reselling much of Rus- 
sia's military stockpile of uranium as civilian nuclear reactor 
fuel turned down repeated requests this year to buy material 
sufficient to build 400 Hiroshima-size bombs. 
The incident raises the question of whether the corporation, 
the U.S. Enrichment Corp., put its own financial interest ahead 
of the national-security goal of preventing weapons-grade ura- 
nium from falling into the hands of terrorists or rogue states. 
The corporation has thus far taken delivery from Russia of 
reactor fuel derived from 13 tons of bomb-grade uranium. 
"The nonproliferation objectives of the agreement are being 
achieved," a spokesman for the Enrichment Corp. said. 
But since the beginning of the program, skeptics have ques- 
tioned the wisdom of designating the Enrichment Corp. as 
Washington's "executive agent" in managing the deal with 
Russia's Ministry of Atomic Energy, or MINATOM. 
Domenici, chairman of the energy subcommittee of the Sen- 
ate Appropriations Committee, which is shepherding the pri- 
vatization plan through Congress, was never informed of the 
offer by the administration. After learning of the rebuff to 
the Russians, he wrote to Curtis asking that the Enrichment 
Corp. "be immediately replaced as executive agent" and warn- 
ing that "under no circumstances should the sale of the USEC 
proceed until this matter is resolved." Once Domenici entered 
the fray, the administration changed its tune. 
Curtis sent a letter to Domenici stating that all the problems 
blocking acceptance of the extra six tons had been solved. Peo- 
ple close to the administration said that the Enrichment Corp. 
has now been advised to buy the full 18-ton shipment in 1997. 
Moreover, Curtis quickly convened a new committee to moni- 
tor the Enrichment Corp. for signs of foot-dragging. 
Evaluation 
Our program has been tested on a variety of news- 
like documents, including Associated Press news- 
wire messages, articles from the New York Times, 
The Wall Street Journal, Financial Times, San Jose 
Mercury, as well as documents from the Federal Reg- 
ister, and the Congressional Record. The summa- 
rizer is domain independent, and it can be easily 
adapted to most European languages. It is also 
227 
very robust: we used it to derive summaries of 
thousands of documents returned by an information 
retrieval system. Early results from these evalua- 
tions indicate that the summaries generated using 
our DMS method offer an excellent tradeoff between 
time/length and accuracy. Our summaries tend to 
be shorter and contain less extraneous material than 
those obtained using different methods. This is fur- 
ther confirmed by the favorable responses we re- 
ceived from the users. 
Thus far there has been only one systematic multi- 
site evaluation of summarization approaches, con- 
ducted in early 1998, organized by U.S. DARPA 1 in 
the tradition of Message Understanding Conferences 
(MUC) (DAR 1993) and Text Retrieval Conferences 
(TREC) (Harman 1997a), which have proven suc- 
cessful in stimulating research in their respective 
areas: information extraction and information re- 
trieval. The summarization evaluation focused on 
content representativeness of indicative summaries 
and comprehensiveness of informative summaries. 
Other factors affecting the quality of summaries, 
such as brevity, readability, and usefulness were eval- 
uated indirectly, as parameters of the main scores. 
For more details see (Firmin & Sundheim 1998). 
The indicative summaries were scored for rele- 
vance to pre-selected topics and compared to the 
classification of respective full documents. In this 
evaluation, a summary was considered successful if it 
preserved the original document's relevance or non- 
relevance to a topic. Moreover, the recall and preci- 
sion scores were normalized by the length of the sum- 
mary (in words) relative to the length of the original 
document, as well as by the clock time taken by the 
evaluators to reach their topic relevance decisions. 
The first normalization measured the degree of con- 
tent compression provided by the summaries, while 
the second normalization was intended to gauge 
their readability. The results showed a strong corre- 
lation between these two measures, which may indi- 
cate that readability was in fact equated with mean- 
ingfulness, that is, hard to read summaries were 
quickly judged non-relevant. 
For all the participants the best summaries scored 
better than the fixed-length summaries. When nor- 
malized for length our summarizer had the highest 
score for best summaries and took the second place 
for fixed-length summaries. The F-scores for indica- 
tive topical summaries (best and fixed-length) were 
very close for all participants. Apparently it is easier 
to generate a topical summary then a general sum- 
mary. Normalizing for length did move our score 
1(The U.S.) Defense Advanced Research Projects 
Agency 
up, but again, there was no significant difference be- 
tween participants. 
The informative (topical) summaries were scored 
for their ability to provide answers to who, what, 
when, how, etc. questions about the topics. These 
questions were unknown to the developers, so sys- 
tems could not directly extract facts to satisfy them. 
Again, scores were normalized for summary length, 
but no time normalization was used. This evalua- 
tion was done on a significantly smaller scale than 
for the indicative summaries, simply because scor- 
ing for question answering was more time consuming 
for the human judges than categorization decisions. 
This evaluation could probably be recast as catego- 
rization problem, if we only assumed that the ques- 
tions in the test were the topics, and that a summary 
needs to be relevant to multiple topics. 
Informative summaries were generated using the 
same general algorithm with two modifications. 
First, the expected summary length was set at 30% 
of the original, following an observation by the con- 
ference organizers while evaluating human generated 
summaries. Second, since the completeness of an in- 
formative summary was judged on the basis of it 
containing satisfactory answers to questions which 
were not part of the topic specification, we added 
extra scores to passages containing possible answers: 
proper names (who, where) and numerics (when, 
how much). Finally, we note that the test data used 
for evaluation, while generally of news-like genre, 
varied greatly in content, style and the subject mat- 
ter, therefore domain-independence was critical. 
Again our summarizer performed quite well, al- 
though the results are less significant since the ex- 
periment was carried out on such a small scale. The 
results were separated out for three different queries. 
For two queries the system was very close to the top 
performing system, and for the third query the sys- 
tem had an F-score of about 0.61 versus 0.77 for the 
best system. 
In general we are quite pleased with the summa- 
rizer performance, especially since our system was 
not trained on the kind of texts that we had to sum- 
marize. 
Related Work and Future Work 
The current summarizer is still undergoing improve- 
ment and adaptation in order to be able to sum- 
marize more than a single text news document at 
a time. At the same time we are investigating how 
summarization can be used in related but different 
problems. Both will be described below. 
228 
A better and more flexible summarizer 
Currently our summarizer is especially tuned for En- 
glish one-document text-only news summarization. 
While we are still working on improving this, we also 
want the system to be able to summarize a wider va- 
riety of documents. Many challenges remain, includ- 
ing summarization of non-news documents, multi- 
modal documents (such as web pages), foreign lan- 
guage documents and (small or large) groups of doc- 
uments covering one or more topics. 
Typically, a user needs summarization the most 
when dealing with a large number of documents. 
Therefore, the next logical step is to summarize 
more than one documents at a time. At the mo- 
ment we are focusing on multi-document (cross- 
document) summarization of English text-only news 
documents. Just as for single-document summariza- 
tion, multi-document summarization can be generic 
or topical and indicative or informative. Other fac- 
tors that will influence the types of summary are the 
number of documents (a large versus a small set) and 
the variety of topics discussed by the documents (are 
the documents closely related or can they cover very 
different topics). Presentation of a multi-document 
offers a wide variety of choices. One could create 
one large text summary that gives an overview of all 
the main issues mentioned in all summaries. Or per- 
haps give different short summaries for similar doc- 
uments. If the number of documents is very large 
it might be best to create nested summaries with 
high-level descriptions and the possibility to 'zoom 
in' on a subgroup with a more specific summary. A 
user will probably want to have the ability to trace 
information in a summary back to its original doc- 
ument; source information should be a part of the 
summary. If one views summarization in the con- 
text of tracking a topic, the main goal of the sum- 
mary might be to show the new information every 
next document contains, while not repeating infor- 
mation already mentioned in previous documents. 
Another type of summary might highlight the sim- 
ilarities documents have (e.g., all these documents 
are on protection of endangered species) and point- 
ing out the differences they have (e.g., one on bald 
eagles, some on bengal tigers,..). As one can see, 
there are many questions to be answered and the 
answers depend partially on the task environment 
the summarizer will be used in. 
Currently we are focussing on summarizing a 
small set of text-only documents (around 20) all on 
a similar topic. The summary will reflect the main 
points/topics discussed by the documents. Topics 
discussed by more than one document should only 
be mentioned once in the summary together with its 
different sources. When generating the summary we 
want to ensure coherence by placing related topic 
close to each other. The main issues we are address- 
ing is the detection of similar information in order to 
avoid repetition in the summary and the detection 
of related information in order to generated a coher- 
ent summary. This work is right now in progress. 
Our next step will be summarizing large amounts of 
similar information. 
Applying summarization to different 
problems 
Information retrieval (IR) is a task of selecting docu- 
ments from a database in response to a user's query, 
and ranking these documents according to relevance. 
Currently we are investigating the usage of summa- 
rization in order to build (either automatically or 
with the help of the user) more effective information 
need statements for an automated document search 
system. The premise is quite simple: use the ini- 
tial user's statement of information need to sample 
the database for documents, summarize the returned 
documents topically, then add selected summaries to 
the initial statement to make it richer and more spe- 
cific. Adding appropriate summaries can be either 
done by the user who reads the summaries or au- 
tomatically. Both approaches are described in our 
other paper appearing in this volume. 
The task of tracking a topic consists of identifying 
those information segments in a information stream 
that are relevant to a certain topic. Topic tracking is 
one of the three main tasks in the TDT (Topic De- 
tection and Tracking) tasks that we hope to use our 
summarizer for. The information stream consists of 
news, either from a tv broadcast or a radio broad- 
cast. Speech from these programs has been recog- 
nized by a state-of-the-art automatic speech recog- 
nition system and also transcribed by human tran- 
scriptionists. A topic is defined implicitly by a set 
of training stories that are given to be on this topic. 
The basic idea behind our approach is simple. We 
use the training stories to create a set of keywords 
(the query). Since we process continuous news the 
input is not segmented into paragraphs or any other 
meaningful text unit. Before applying our summa- 
rizer each story is divided into equal word-size seg- 
ments. We summarize every story using our query, 
and use similarity of the summary to the query to 
decide whether a story is on topic or not. We are 
still in the process of refining our system and hope 
to have our first results soon. Initial results sug- 
gest that this is a viable approach. It is encouraging 
to notice that the absence of a paragraph structure 
does not prevent the system from generating useful 
229 
summaries. 
CONCLUSIONS 
We have developed a method to derive quick-read 
summaries from news-like texts using a number of 
shallow NLP techniques and simple quantitative 
methods. In our approach, a summary is assem- 
bled out of passages extracted from the original text, 
based on a pre-determined Background-News dis- 
course template. The result is a very efficient, ro- 
bust, and portable summarizer that can be applied 
to a variety of tasks. These include brief indica- 
tive summaries, both generic and topical, as well as 
longer informative digests. Our method has been 
shown to produce summaries that offer an excellent 
tradeoff between text reduction and content preser- 
vation, as indicated by the results of the government- 
sponsored formal evaluation. 
The present version of the summarizer can han- 
dle most written texts with well-defined paragraph 
structure. While the algorithm is primarily tuned 
to newspaper-like articles, we believe it can produce 
news-style summaries for other factual texts, as long 
as their rhetorical structures are reasonably linear, 
and no prescribed stylistic organization is expected. 
For such cases a more advanced discourse analysis 
will be required along with more elaborate DMS 
templates. 
We used the summarizer to build effective search 
topics for an information retrieval system. This 
has been demonstrated to produce dramatic per- 
formance improvements in TREC evaluations. We 
believe that this topic expansion approach will also 
prove useful in searching very large databases where 
obtaining a full index may be impractical or impos- 
sible, and accurate sampling will become critical. 
Our future development plans will focus on im- 
proving the quality of the summaries by implement- 
ing additional passage scoring functions. Further 
plans include handling more complex DMS's, and 
adaptation of the summarizer to texts other than 
news, as well as to texts written in foreign languages. 
We plan further experiments with topic expansion 
with the goal of achieving a full automation of the 
process while retaining the performance gains. 
Acknowledgements 
We would like to acknowledge the significant con- 
tributions to this project from two former members 
of our group: Fang Lin and Wang Jin. This work 
was supported in part by the Defense Advanced Re- 
search Projects Agency under Tipster Phase-3 Con- 
tract 97-F157200-000 through the Office of Research 
and Development. 

References 

Brandow, R.; Mitze, K.; and Rau, L. 1995. Automatic conden- 
sation of electronic publications by sentence selection. Informa- 
tion Processing and Management 31(5):675-686. 

DARPA. 1993. Proceedings of the 5th Message Understanding 
Conference, San Francisco, CA: Morgan Kaufman Publishers. 

DARPA. 1996. Tipster Text Phase 2:24 month Conference, 
Morgan-Kaufmann. 

DeJong, G. G. 1982. An overview of the frump system. In Lehn- 
ert, W., and Ringle, M., eds., Strategies for Natural Language 
Processing. Lawrence Erlbaum, Hillsdale, NJ. 

Firmin, T., and Sundheim, B. 1998. Tipster/summac evaluation 
analysis. In Tipster Phase III 18-month Workshop. 

Harman, D., ed. 1997a. The 5th Text Retrieval Conference 
(TREC.5), number 500-253. National Institute of Standards 
and Technology. 

Hearst, M. 1997. Texttiling: Segmenting text into multi- 
paragraph subtopic passages. Computational Linguistics 
23(1):33-64. 

Kupiec, J.; Pedersen, J.; and Chen, F. 1995. A trainable doc- 
ument summarizer. In Conference of the ACM Special Interes 
Group on Information Retrieval (SIGIR), 68-73. 

Lehnert, W. 1981. Plots units and narrative summarization. 
Cognitive Science 4:293-331. 

Luhn, H. 1958. The automatic creation of literature abstracts. 
IBM Journal 159-165. 

Marcu, D. 1997a. From discourse structures to text summaries. 
In Proceedings of the ACL Workshop on Intelligent, Scallable 
Text Summarization, 82-88. 

Marcu, D. 1997b. The rhetorical parsing of natural language 
texts. In Proceedings of 35th Annual Meetings of the ACL, 
96-103. 

McKeown, K., and Radev, D. 1995. Generating summaries of 
multiple news articles. In Proceedings of the 8th Annual ACM 
SIGIR Conference on R~4D in IR. 

Ono, K.; Sumita, K.; and Miike, S. 1994. Abstract gener- 
ation based on rhetorical structure extraction. In Proceedings 
of the International Conference on Computational Linguisites 
(COLING-9~), 344-348. 

Paice, C. 1990. Constructing literature abstracts by computer: 
techniques and prospects. Information Processing and Man- 
agement 26(1):171-186. 

Rino, L., and Scott, D. 1994. Content selection in summary gen- 
eration. In Third International Conference on the Cognitive 
Science of Natural Language Processing. 

Salton, G.; Allan, J.; Buckley, C.; and Singhal, A. 1994. Au- 
tomatic analysis, theme generation, and summarization of ma- 
chine readable texts. Science 264:1412-1426. 

Strzalkowski, T., and Wang, J. 1996. A self-learning universal 
concept spotter. In Proceedings of the 17th International Con- 
ference on Computational Linguistics (COLING°96), 931-936. 

Teufel, S., and Moens, M. 1997. Sentence extraction as classi- 
fication task. In Proceedings of the ACL Workshop on Intelli- 
gent, Scallable Text Summarization. 

Weissberg, R., and Buker, S. 1990. Writing up Research: Ex- 
perimental Research Report Writing for Student of English. 
Prentice Hall, Inc. 
