A Proposal for Task-based Evaluation of Text Summarization Systems 
Th~rtse Firmin Hand 
Department of Defense 
9800 Savage Rd 
Ft Meade, MD 20755-6000, USA 
t fimlnOromulus, ncsc• mil 
Abstract 
Evaluauon is a key part of any research 
and development effort, but the goals and 
focus of evaluat:ons are often narrow m 
scope, addressing a specific algonthm or 
technique, or analyzing a single result 
All of the evaluation work clone to date on 
text summarization systems has been by 
the developers of mdlvldual systems, usu- 
ally to study and improve sentence selec- 
tion cntena Under TIPSTER III, DARPA 
~s sponsoring a task-based evaluauon of 
multiple text summarization systems 
This focus of this evaluation wall be on 
user needs, and the feaslbdlty of applying 
summarization technology to a variety of 
tasks 
1 Introduction 
The explosion of on-lme textual matenal and the 
advances m text.processing technology have pro- 
wded an important opportunity for broad apphca- 
tJon of text summanzauon systems Numerous 
techmques for denwng summaries from full text 
documents have already been implemented, and 
there are several commercial summarization prod- 
ucts available The summaries generated by these 
systems are potenually useful m a variety of set- 
tmgs In 1997, the US Government wall begin a 
Defense Advanced Research Projects Agency 
(DARPA)-sponsored program under the TIPSTER 
umbrella to evaluate full text summanzaruon sys-- 
terns to prowde feedback to researchers and com- 
mercial msututtons on the utdtty of various 
approaches to spec~ffic summanzauon tasks TIP- 
STER, discussed m more detml later, is a DARPA 
lmtlattve wxth participation from multiple US 
government agencies.and research and commer- 
cial msmut~ons to pushthe stareof the art m text 
processing technologies 
2 Concepts of Text Summarization 
Automatic summaries are usually descnbed m 
terms of certain key features which relate to the 
concepts of intent, focus, and Coverage 
• Intent describes the potential use of the sum- 
mary, either mdlcattve or reformative Indlca- 
ttve summaries, used m this context, provide 
just enough mformatlon to judge the relevancy 
of the full text Informattve or substantttve sum- 
manes serve as substztutes for the full docu- 
ments, retaining all tmportant detads 
• Focus refers to the scope of the summary, 
either generic or user-directed A generic sum- 
mary Is based on the mmn concept(s) of a doc- 
ument, whereas a user- or goal-directed 
summary Is based on the topic of interest indi- 
cated by the recipient of the summary 
• Coverage tn&cates whether the summary is 
based on a single document or multiple docu- 
" ments 
Much of the historical work m automatic text 
summarization has been geared towards the cre- 
ation of indicative, generic summaries of single 
documents For example, the work of Luhn 
(1958), Edmundson (1969), Johnson et al (1993) 
and Brandow et al (1995) all generated this type 
of summary, although their approaches have 
included different combmauons of staustJcal and 
hngmsttc techmques Luhn (1958) considered fre- 
quency of word occurrence within a document 
31 
and the posmon of the word m a sentence, 
Edmundson (1969) looked at cue words, taOe and 
beading words, and structural indicators, 
Johnsonet al (1993) used md~tbr phrases, and 
Brandow et al (1995) apphed sentence welghung 
using signature word selection Most of these 
approaches claim some degree of domain inde- 
pendence, however they have been tested only on 
a specific type of data, such as newspaper arucles 
(Brandow et al 1995) or techmcal hterature 
(Edmundson 1969) 
More recently, the scope of research has 
expanded to include reformative, user-directed, 
and multi-document summaries Reamer and 
Hahn (1988), Maybury (1993), and McKeown 
and Radev (1995) used knowledge-based 
approaches to generate mformauve surmnanes 
that can serve as subsututes for the original docu- 
ment 
The expansion m focus to include user-dtrected 
summaries has been influenced by research m 
reformation retrieval cormnumty on passage- 
based retrieval, as m the work of KhanS et al 
(1996) Also, advances m stalasUcal learning algo- 
rithms, such as those maplemented by Kup~ee et 
al (1995) and Aone et al (1997) have combined 
generic surnmartes and user-customtzatlon, allow- 
mg the userto affect the content of the summaries 
by mampulaung sentence extractaon features 
The potenual for multt-document summariza- 
tion as proposed by the work of Strzalkowska 
(1996) and Mare and Bloedom (1997) is based m 
part on advances m mformauon retrteval and 
mformauon extracuon performance 
3 Previous Evaluations 
During the course of their development, most of 
the above systems were subject to some form of 
evaluation Many of these evalualaons rehed on 
the presence of a human-generated target abstract, 
or the notion of a single 'best' abstract, although 
there is fairly uniform acceptance of the behef 
that any number of acceptable abstracts could 
effectwely represent the content of a single docu- 
ment Human-generated abstracts attempt to cap- 
ture the central concept(s) of a document using 
the terminology of the document, along the lines 
of a generic summary The comparisons made 
between the human-generated versus machme- 
generated summaries were mtended pnmanly for 
the developers' own benefit, and evaluate the tech- 
nology ~tself, rather than the utahty of the technol- 
ogy for a given task .Other evaluauons did focus 
on specific tasks and potenual uses of automatac 
summaries, but only w~th respect to a single sys- 
• tern and a hrmted document set 
Many dttferent techmques were attempted m 
the area of mtrmslc or developer-oriented evalua- 
uons, wluch judge the quahty of summaries 
Edmundson (1969) compared sentence selection 
m the automauc abstracts to the target abstracts, 
and also performed a subjective evaluation of the 
content Johnson et al (1993) proposed matching 
a template of manually generated key concepts 
with the concepts included m the abstract, and 
performed one sample abstract evaluation Pa~ce 
and Jones (1993) used a set of statlslacs to deter- 
anne ff the summary effeclavely captured the focal 
concepts, the non-focal concepts, and conclu- 
sions Using a smctly statastlcal measure, Kuplec 
et al (i995) calculated the percentage of sentence 
matches and parual matches between thetr auto- 
matlc summary and a manually generated 
abstract The mare problem with this type of eval- 
uation is ~ts rehance on the nouon of a single 'cor- 
rect' abstract Smce many different 
representations of a document can form an effec- 
ave summary, this is an inappropriate measure 
In extrinsic or task-oriented evaluations, the 
mformat~on retrieval notlon of relevancy of a doc- 
ument to a specific topic ~s the common measure 
for summarization testing Make et al (1994) 
analyzed key sentence coverage and'also recorded 
t~rmng and precision/recall stattstles to make rele- 
vance decisions based on summaries for a 
domain-specific summarizer Brandow et al 
(1995) had news analysts compare the summaries 
generated using stat~stteal and natural language 
processing (NLP) techniques to suramanes using 
the mmal sentences (called the "lead summaries") 
of the document Brandow et al (1995) discov- 
ered that m general, expenenced news analysts 
felt that the lead summaries were more acceptable 
than the summartes created using sophisticated 
NLP teehmques Mare and Bloedom (1997) gen- 
erated smnlar precision/recall and tmung mea- 
sures for an mformalaon retrieval experiment 
usmg a graph search and matching techmque and 
32 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
Evaluation Quanfitabve 
Decision Task 
CatcgonzaUon 
Adhoc 
TABLE 1. Pro 
Intent 
Inchcauve 
IndlcaUve 
Focus 
Genenc 
User-dn'ected 
Coverage 
Smgle 
document 
Single 
document 
appropriate 
category 
relevant to 
topic 
measures 
Ume 
accuracy 
umo 
accuracy 
)osed Evaluation 
learned that then" summaries were effecuve 
enough to support accurate retrieval 
4 Proposed Evaluation 
Full text summanzauon as a major task an TIP- 
STER Phase HI TIPSTER Phase I sponsored 
research an reformation extracUon and reforma- 
tion retrieval, and supported the Message Under- 
standing Conferences (MUC). and Text REtrieval. 
Conferences (TREC) for evaluating extraction 
and retrieval performance, respectively (Mer- 
chant, 1993) TIPSTER Phase II concentrated on 
defining a common architecture to facdnate mte- 
gratxon of the two technologaes TIPSTER Phase 
HI conUnues to advance research m extraction and 
retneval, and adds text summarization in both the 
research and formal evaluation arenas (Merchant, 
1996) Thas propose d evaluation wdl be a formal, 
large scale, multiple task, multiple system evalua- 
tion independent from any single approach or 
methodology 
As outlined m Table 1, the proposed evaluation 
for text summarization will be task-based, judging 
the utdny of a summary to a particular task It wall 
be an evaluation for users, determining fitness for 
a pamcular purpose, versus an evaluation stnctly 
for developers It is not intended to pick the best 
systems, but to understand some of the ~ssues 
revolved an budding summanzauon systems and 
evaluating them It wall provade an enwronment 
whereby systems wdl be judged independently on 
their apphcablhty to a given task 
We will began with at least two tasks for the first 
evaluatton, following the MUC and TREC exam- 
ples of testing along multiple &mensions We 
hope thin will avoad any re&rection of research 
efforts based on relative performance on any 
gaven task 
Ad&Uonal tasks will be added m subsequent 
years to evaluate other aspects of text summaries 
These tasks will also reflect continued maturation 
of the technology 
4.1 Goals 
Automauc text summarization systems lend them- 
selves to many tasks An mformaUve summary 
may be used as the basis for execuUve decisions 
An mdlcauve summary may be used as an mmal 
ln&cator of relevance prior to reviewing the full 
. text of a document (and possibly ehmmatlng the 
need to view that full text) Summaries (used m 
place of full text documents) may also be used to 
~mprove precision m mformatton retrieval sys- 
tems, since users would be searching only the 
content-relevant words or phrases gathm a docu- 
ment (Brandow et al, 1995) Forthis mlual evalu- 
ation, we wdl concentrate on tasks that appear to 
offer the possablhty of near term payoff for users 
We attempted to devise tasks that model the real 
world activmes of reformation analysts and con- 
sumers of large quantities of text These tasks 
were designed based on anterwews with users 
who spend a majority of their workday searching 
through volumes of on-hne text for reformation 
relevant to thear area of interest 
We will begm with tasks that address the focus 
(genenc or user-darected) of the summaries The 
first task, categortzation, wdl evaluate generic 
summaries, and the other, adhoc retrieval, wall 
33 
evaluate user-&rected summaries, as described 
below 
4.1.1 Task 1 - Categorization 
Whale mformauon routang systems are beeormng 
prevalent m many work enwronments, there Is 
stall a role m many such places for a central 
rewew authority to scan and &stnbute all incom- 
ing documents based on their content, essentmlly 
perforrmng a manual routing task These rewew- 
' ers deal bothwith a broad topic base and with data 
from mulUple sources They must browse a docu- 
ment qmckly to determine the key concepts, and 
forward that document to the appropnate mdwzd- 
ual 
A related task revolves scanmng a large set of 
documents that has beenselected using an 
extremely broad m&cator or concept A user will 
browse through this data and categorize ~t accord- 
mg to various parameters For example; on the 
World-Wide-Web (WWW), mformataon seekers 
frequently enter short, broad queries that return 
• hundreds or even thousands of documents The 
user must determine which documents represent 
the greatest potentaal for prowdmg mformataon of 
mterest 
Integrating text summanzataon into each of the 
above scenarios, the user would be presented a 
generic summary m heu of the full text, from 
which he or she wall make a categonzataon deci- 
sion 
The evaluataon task wall simulate the manual 
routing scenario described above The goal wdl be 
to decide qmckly whether or not a document con- 
tmns mforrnataon about any of a hrmted number 
of topic areas The document wall be hrmted to a 
single topic 
Selectaons from the TREC test collectaons of 
query topics and documents wall be used as the 
data for the evaluauon we roll select a mtmmum 
of five d~stlnct topics, approxlmately 200 docu- 
ments per topic At least two of the topics wdl be 
entaty-based 0 e based on the MUC categories of 
person, Iocataon, and organ~zataon) The toplcs 
wdl be related at a very broad level The docu- 
ment set prowded will be that returned as a result 
of five simple queries to a commonly used mfor- 
mataon retrieval system, wluch should provade an 
adequate m~x of shorter and longer documents 
The resultmg documents wall be randomly mixed 
The TREC test collectaons are described m detml 
m Harman (1993) 
Only the documents will be provided to the 
evaluauon partac~pants Summanzauon systems 
developed by the parttclpants wall autornaucally 
generate a generic summary of each document 
There wall not be any constraints on the format of 
the summary All summaries submated by the 
partactpants wdl be combined by the evaluataon 
. organizers into a single group and randomly 
mtxed 
The full text of the document and the lead sen- 
tences of the document (up t.o the specified cutoff 
length) wall be used as baselines The summaries 
prowded by the paraclpants, the basehne lead 
summaries, and the full text documents wdl be 
m~xed together, resulting m N+2 versions of a sin- 
gle document, where N is the number of evalua- 
taon paruc~pants This document set .wdl be 
randomly &wded among the assessors Assessors 
for the evaluataon wdl be professional reformation 
analysts Each assessor wall read a summary or 
document and categorize it into one of the five 
topic areas that were selected by the organmers, or 
'none of the above', which can be considered a 
sixth category No assessor will read more than 
one version (summary or full-text) of a single doc- 
ument The assessor's decision-making process 
wdl be tamed The assessor wdl then move on to 
the next document of summary 
In addmon to the TREC relevance judgments, a 
minimum of two addmonal assessors wall read all 
of the full text documents to estabhsh a ground 
truth relevance decision for each 
The assessors wdl be tamed, and their categon- 
zataon decisions wall be compared to the ground 
truth assessments This methodology wall assure 
that the assessors' own categonzataon perfor- 
mance can be measured along with the perfor-~ 
mance of the summanzataon systems 
4.1.2 Task 2 - Adhoc retrieval 
Both the volume of data avadable on-line and the 
prevalence of mformataon retaaeval engines have 
created an ~mmechate apphcataon for Implement- 
hag a text summanzataon filter as a back end to an 
mformatton retrieval engine, whereby the user 
could qmckly and accurately judge the relevancy 
34 
I 
I 
I 
I 
I 
I 
II 
II 
II 
II 
I 
II 
II 
II 
II 
II 
II 
II 
II 
of documents returned as a result of a query The 
user's query has threet bearing on the content of 
the documents returned 
Applying text surnmanzatlon to the above sce- 
nario, the user would be presented a summary 
based on the query (a user-dlrected summary), 
instead of the full text, from wtuch he or she wdl 
make a relevance assessment 
The second evaluatmn task wall simulate the 
adhoc retrieval scenario descnbed above The 
goal wdl be to decide the relevancy of a retrieved 
document by looking only at the user-directed 
summary that has been generated by the system 
under evaluation 
The TREe collection will also provide the com- 
mon test data used for fiats task m the same pro-. 
porUons as for the categorization tasks, five hand- 
selected topics and approxmaately 200 documents 
for each topic The document set prowded wdl be 
that returned as a result- of five queries to a com- 
monly used mformauon retrieval system In tins 
case, both the topics and documents wdl be pro- 
v~ded to the pamclpants Summanzanon systems 
developed by the parucipants wdl then autornati- 
tally generate a summary using the topic as the 
mdlcatlon of user interest The full text and a key- 
word-m-context (KWIC) hst wall be used as base- 
lines 
Assessors wall work wRh one topic at a ttme 
All summaries received from the participants for a 
given topic, along with the full text and the KWIC 
summaries will be combined rote a single grou p, 
randomly mtxed, and divided among the asses- 
sors Each assessor will review atoplc, then read 
each summary or document and judge whether or 
not it is relevant to the topic at hand The assessor 
wdl then move on to the next topic No assessor 
wall read more that one representaUon of a smgle 
document 
In addmon to the TREC relevance judgments, a 
minimum of two addmonal assessors ,roll read all 
of the full text documents to estabhsh a ground 
truth relevance deczsmn 
4.2 Evaluation Criteria 
Both evaluations lughhght the acceptabdzty of a 
summary for a gwen task, wRh the assumption 
that there ~s not a single 'correct' summary The 
mare purpose wdl be to deterrnme ff the evaluator 
would make the same decision ff gwen the full 
text, and how much longer it would take to make 
that declsmn The ~deal outcome would be that the 
declsmn_could be made with the same accuracy m 
shorter urne, given the document summary For 
each task, we wdl record the time reqmred to 
make each decision, and the actual decision The 
declsmn for each evaluator wdl then be compared 
to the relevance decision for the basehnes Analy- 
sis of the results wdl include consideration of the 
effects of summary length on the Ume taken to 
make the relevance declsmn as well as Its effects 
on decision accuracy 
Quanutative measures 
• CategonzauonfRelevance Decisions 
Determining relevance to a given topic Is an inher- 
ently subjecttve actwlty We intend to waUgate 
this by using a sound statis~cal model to deter- 
mine the appropriate number of summaries to 
evaluate, and by structuring the evaluation m such 
a way as to avoid bias of any single assessor As 
previously discussed, we wdl esmbhsh low-end 
and high-end basdmes and use multiple assessors 
to create ground truth declsmns 
• Ttme Reqmred 
The time reqmred to make a relevance or categori- 
zation decision using a summary will be recorded 
and compared with the time reqmred to make the 
same decision using the full text 
• Summary Length 
In prevmus stu&es, 20-30% of full document 
length was often used as optnnal cutoff length for 
reformative summaries, wlth the supposmon that 
m&cattvo summaries would reqmre far less refor- 
mation ((Brandow et al, 1995) and (Kuplec et al, 
1995)) For the tmual evaluatlon, whlch wdl use 
m&catwe summaries only, a document cutoff 
length wdl be estabhshed at 10% of the original 
document length Any summary exceeding that 
margin wall be truncated 
Quahtative measures 
• User Preference 
Evaluators wdl be asked to indicate whether they 
prefer the full text or the summary as a basis for 
declslon-malang In addmon to this quahtatwe 
35 
.... ; i ' i Evaluation I Qnantitatrve 
Intent Focus Coverage Goal I measures 
IndlcaUve Genenc Smgle Improve IR Precmon 
document 
.Task 
Index 
summaries for 
I mformaUon 
i remeval • 
Summanze 
across 
documents 
Exe~tlUve 
decision 
malong 
Inchcatwe or 
Informauve 
Generic or 
User-directed 
Muluple 
document 
preclslon 
Reduce 
mformauon 
processing 
load 
Recall 
Accuracy 
Informattve G~nerlc or 
User-&rected 
Slrlgle or 
multt- 
document 
Include all 
relevant 
mformaUon 
Key Concept" 
matchmg 
• Formatted 
quesUo, ns 
TABLE 2. Future EvaluaUons 
assessment, the evaluator will be encouraged to 
provide feedback as to why the summary was or 
was not acceptable for a given task This feedback 
will then he made avmlable for system developers 
It could also provide a basis for subsequent evalu- 
attons 
5 Future Direction of Evaluation 
Thls mmal evaluauon will address only a hn~ted 
number of issues mvolwng automaUc text sum- 
manzauon technology As we gmn more experi- 
ence working wlth these systems and integrating 
them mto a user's work flow, the scope of the 
evaluauons wdl necessarily grow and change 
Some addtuonal features and tasks to be 
addressed petenually m future evaluations have 
already been identified, including cohesiveness of 
a summary, optunal length of a summary, and 
multl-document summaries Selected tasks are 
outlined m Table 2 and described briefly below 
5.1 Tasks and Measures 
We are addr'essmg two mformaUon retrieval types 
of tasks dunng the first evaluatton, however, 
potenttal apphcatxons go beyond this hrmted 
scope One of the frequently mentloned uses of a 
text summary is as a substttute for the document 
dunng the indexing process of an mfonnmon 
retrieval system The nouon is that mdeyang based 
on summaries would result m more results retriev- 
als because only the key concepts and content- 
beanng words would have been indexed This 
idea could be evaluated using standard precmmn 
and recall mforraauon retrieval measures 
Summarizing across muluple documents m 
another extremely useful apphcatton Whde sin- 
gle document summaries are expected to prowde 
improved efficlency for the end-user, much of the 
mformanon rewewed from one summary to the 
next will be redundant Automatically generated 
summaries could result m even larger efficiency 
gains and productivity Lmprovements by dls~hng 
the mformauon from muluple documents rote a 
single summary An evaluauon of tius type of 
summary would be much more complex, posstbly 
comparing at a phrase-matching or key concept 
level the combined factual mforma~on included 
m a single summary with manually ldenufied key 
mformauon m mdwldual documents The evalua- 
Uon would verify that the relevant aspects of key 
facts across documents have been successfully 
Identified and combmed m the resulting summary 
• A thlrd apphcat~on could focus on a decmon- 
malong task based on an mformauve summary 
An evaluaUon of thls type of summary could 
include filling out a template mdlcatmg key con- 
cepts m a document, slrmlar to the Pmce and 
Jones (1993) and Johnson et al (1993) evalua- 
uons, possibly augmented by a quesuon/answer 
measure based on the full text and the summary 
36 
5.2 Data 
Newspaper arucles, such as those which wall be 
used for the first evaluation, represent only a small 
pornon of the type of information avmlable on- 
line A useful, effective surnrnanzer should be 
able to accept text m a variety of formats Wah 
each subsequent evaluataon, new sources of data 
will be added These new sources could be news 
feeds or web pages They will tend to be less for- 
matted, vary greatly m length, and cover multiple 
topics At some point, we hope to introduce docu-. 
ments m languages other than Enghsh for summa- 
nzatlon either into then" native language or into 
Enghsh 
6 Acknowledgments 
The author is grateful to Donna Harman and Beth 
Sundhelm for then" support and assmance m 
designing the evaluauon 
The views expressed m this paper are those of 
the author and do not necessarily reflect the views 
• of the Department of Defense or any of its agen- 
cies 

References 
Chmatsu Aone, Mary Ellen Okurowska, James 
Gorhnsky, and Bjomar Larsen 1997 A Scal- 
able Summarization System Usmg Robust NLP 
In Proceedings of ACL-97, Madrid, Spmn, July 
To appear " 

Ronald Brandow, Karl Mltze, and Lisa F Rau 
1995 Automatic Condensation of Electromc 
Pubhcations by Sentence Selection Information 
Processmg and Management, 31 (5) 675-685 

Kenneth W Church and Lisa F Ran 1995 Com- 
mercial Apphcations of Natural Language Pro- 
cessmg Commumcattons of the ACM, 38(11) 
71-79 

H P Edmundson 1969 New Methods m Auto- 
mattc Abstracting Journal of the ACM, 16(2) 
264-285 

Bngette Endres-Nlggemeyer, Jerry Hobbs, and 
Karen Sparck Jones 1993 Summarizing Text 
for Intelhgent Commumcation In Dagstuhl 
Seminar Report, IBFI GmbH, Schloss Dagstuhl, 
Wadem, Germany 

J R GaUlers and Karen Sparck Jones. i993 
Evaluating Natural Language Pr~essmg Sys- 
tems Umverstty of Cambridge Computer Labo- 
ratory TechmcalReport No 291, Computer 
Laboratory, University of Cambridge 

Donna Harman 1993 Overview of the First Text 
REtrieval Conference CI"REC-1) In TREC-2 
Proceedings, Gmthersburg, Maryland 

Donna Harman 1996 Overview of the Fourth 
Text REtrieval Conference (TREC-4) In The 
Fourth Text REtrieval Conference (TREC.4), 
pages 1-24, Gaithersburg, Maryland, 1995 

F C Johnson, C D Patce, W J Black, and A P 
Neal 1993 The appllcaUon of hngtustac pro- 
cessmg to automatic abstract generaUon Jour- 
nal of Document and Text Management, 1(3) 
215-241 . 

Daniel Knaus, Elke Mtttendorf, Peter Schauble, 
and P~Lrmc Sheridan i996 Highlighting Rele- 
vant Passages for Users of the Interactave SPI- 
DER Retrieval System In The Fourth Text 
REtrieval Conference (TREC-4), pages 233- 
238, Gaxthersburg, Maryland, 1995 

Juhan Kuplec, Jan Pedersen, and Francme Chen 
1995 A Trmnable Document Summarizer 
SIGIR "95, pages 68-73, Seattle, Washington, 
1995 

H P Luhn 1958 The Automatic Creation of LR- 
erature Abstracts IBM Journal, pages 159-165 

Inderjeet Mare and Erie Bloedorn 1997 Multi- 
document Summarization by Graph Search and 
Matching In Proceedings of AAAI-97, Provi- 
dence Rhode Island, 1997 To appear 

Mark T Maybury. 1993. Automated Event Sum- 
manzation Techniques In Dagstuhl Seminar 
Report, pages 100-108, IBFI GmbH, Schloss 
Dagstuhl, Wadem, Germany 

Kathleen McKeown and Dragormr R Radev 
1995 Generating Summaries of Multiple News 
Articles SIGIR "95. pages 74-82, Seattle, Wash- 
region 

Roberta Merchant 1993 Tipster Program Over- 
view In Tzpster Text Program, pages 1-2, Fred- 
encksburg, Virginia ~ 

Roberta Merchant 1996 TIPSTER Phase III In 
TIPSTER Text Phase II1 Ktckoff Workshop, 
Columbia, Maryland, October 

Andrew H Morns, George M Kasper, and Den- 
ms A Adams 1992 The Effects and Lwmta- 
uons of Autoraated Text Condensing on 
Reading Comprehension Performance Infor- 
mauon Systems Research 3 1, pages 17-35 

Selj1 Muke, Etsuo Itoh,. Kenjl Ono, and Kazuo 
Surmta 1994 A Full-Text Retrieval System 
with a Dynarmc Abstract Generauon FuncUon 
SIGIR '94, pages 152-161, Seattle, Washington 

C D Palce 1990 Constracung .Literature 
Abstracts by Computer Techmques and Pros- 
pects Informatwn Processmg and Manage- 
ment, 26(1) 171-186 

Chris D Pmce and Paul A Jones 1993 The 
Idenuficatlon of Important Concepts m I-hghly 
Structured Teehmcal Papers SIGIR '93, pages 
69-77 

G J Rath, A Restock, and T R Savage 1961 
The FormaUon of Abstract by the Seleclaon of 
Sentences American Documentatwn, pages 
139-143 

U Relraer and U Hahn 1988 Text Condensation 
as a Knowledge Base Abstractton IEEE Con- 
ference on AI Apphcatwns, pages 338-344 

Tomek Strzalkowski 1996 Robust Natural Lan- 
guage Processmg and User-Graded Concept 
Discovery for Information Retrieval, Extraction, 
and Surnmanzatlon Tipster Phase III In T/P- 
STER Text Phase III Ktckoff Workshop, Colum- 
bin, Maryland, October 

Beth Sundhesm 1995 Overview of Results of the 
MUC-6 Evaluauon In Sooth Message Under- 
standmg Conference (MUC-6), pages 13-31, 
Columbm, Maryland 

Sarah Taylor 1996 TIPSTER Text Program 
Overwew In TIPSTER Text Phase 11, Tysons 
Corner, V~rgmm
