An Efficient Text Summarizer Using Lexical Chains 
H.Gregory SHber and Kathleen F. McCoy 
Computer and information Sciences 
University of Delaware 
Newark, DE 19711 
{ silber, mccoy} @cis. udel. edu 
• Abstract ........ 
We present a system which uses lexical chains as an 
intermediate representation for automatic text sum- 
marization. This system builds on previous research 
by implementing a lexical chain extraction algorithm 
in linear time. The system is reasonably domain in- 
dependent and takes as input any text or HTML 
document. The system outputs a short summary 
based on the most salient concepts from the origi- 
nal document. The length of the extracted summary 
can be either controlled automatically, or manually 
based on length or percentage of compression. While 
still under development, the system provides useful 
summaries which compare well in information con- 
tent to human generated summaries. Additionally, 
the system provides a robust test bed for future sum- 
mary generation research. 
1 Introduction 
Automatic text summarization has long been viewed 
as a two-step process. First, an intermediate repre- 
sentation of the summary must be created. Second, 
a natural  representation of the summary 
must be generated using the intermediate represen- 
tation(Sparek Jones, 1993). Much of the early re- 
search in automatic text summarization has involved 
generation of the intermediate representation. The 
natural  generation problem has only re- 
cently received substantial attention in the context 
of summarization. 
1.1 Motivation 
In order to consider methods for generating natural 
text summaries from large documents, several issues 
must be examined in detail. First, an analysis of the 
quality of the intermediate representation for use in 
generation must be examined. Second, a detailed 
examination of the processes which link the inter- 
mediate representation to a potential final summary 
must be undertaken. 
The system presented here provides a useful first 
step towards these ends. By developing a robust and 
efficient tool to generate these intermediate repre- 
sentations, we can both evaluate the representation 
......... andcormider the difficult problem of generatiiig nat- 
ural  texts from the representation. 
1.2 Background Research 
Much research has been conducted in the area of au- 
tomatic text summarization. Specifically, research 
using lexical chains and related techniques has re- 
ceived much attention. 
Early methods using word frequency counts did 
not consider the relations between similar words. 
Finding the aboutness of a document requires find- 
ing these relations. How these relations occur within 
a document is referred to as cohesion (Halliday 
and Hasan, 1976). First introduced by Morris and 
Hirst (1991), lexical chains represent lexical cohe- 
sion among related terms within a corpus. These 
relations can be recognized by identifying arbitrary 
size sets of words which are semantically related (i.e., 
have a sense flow). These lexical chains provide an 
interesting method for summarization because their 
recognition is easy within the source text and vast 
knowledge sources are not required in order to con> 
pure them. 
Later work using lexical chains was conducted by 
Hirst and St-Onge (1997) using lexical chains to cor- 
rect malapropisms. They used WordNet, a lexical 
database which contains some semantic information 
(http://www.cs.princeton.edu/wn). 
Also using WordNet in their implenmntation. 
Barzilay and Elhadad (1997) dealt with some of tile 
limitations in Hirst and St-Onge's algorithm by ex- 
amining every possible lexical chain which could be 
computed, not just those possible at a given point 
in the text. That is to say, while Hirst and St.Onge 
would compute the chain in which a word should 
be placed when a word was first encountered, Barzi- 
lay and Elhadad computed ever:,' possible chain a 
word could become a member of when the word was 
encountered, and later determined the best interpre- 
tation. 
268 
2 A Linear Time Algorithm for Intra Intra Adjacent Other 
Computing Lexical Chains Pgrph. Segment Segment 
2.1 Overview Same 1 1 1 1 
Our research on lexical chains as an intermediate Synonym 1 1 0 "O " 
representation for automatic text summarization fol- Hypernym I 1 0 0 
lows the research of Barzilay and Elhadad (1997). Hyponym 1 1 0 0 
We use their results as a basis for the utility of Sibling 1 0 0 0 
the methodology. The most substantial difference is 
that Barzilay and Elhadad create all possible chains 
explicitly and then choose the best possible chain, 
whereas we compute them implicitly. 
Table 1: Dynamic Scoring Metrics Set to Mimic 
B+E's Algorithm 
the word itself. These scores are dynamic and can ....... 2~2 
As mentioned above, WordNet is a lexical database 
that contains substantial semantic information. In 
order to facilitate efficient access, the WordNet noun 
database was re-indexed by line number as opposed 
to file position and the file was saved in a binary in- 
dexed format. The database access tools were then 
rewritten to take advantage of this new structure. 
The result of this work is that accesses to the Word- 
Net noun database can be accomplished an order 
of magnitude faster than with the original imple- 
mentation. No additional changes to the WordNet 
databases were made. The re-indexing also provided 
a zero-based continuous numbering scheme that is 
important to our linear time algorithm. This impor- 
tance will be noted below. 
Modifications ~to. Word.Net .............. be set ~ased ,on:segmentation information, dista.nce, 
2.3 Our Algorithm 
Step 1 For each word instance that is a noun 
For every sense of that word 
Compute all scored "meta-chains" 
Step 2 For each word instance 
Figure out which "meta-chain" 
it contributes most to 
Keep the word instance in that chain 
and remove it from all other 
Chains updating the scores 
of each "meta-chain" 
Figure 1: Basic linear time Algorithm for Comput- 
ing Lexical Chains 
Our basic lexical chain algorithm is described 
briefly in Figure 1. The algorithm takes a part of 
speech tagged corpus and extracts the nouns. Us- 
ing WordNet to collect sense information for each of 
these noun instances, the algorithm then computes 
scored "nmta-chains" based on the collected infor- 
mation. A "meta-chain" is a representation of every 
possible lexical chain that can be computed start- 
ing with a word of a given sense. These meta-chains 
are scored in the following manner. As each word in- 
stance is added, its contribution, which is dependent 
on the scoring metrics used, is added to the "meta- 
chain" score. The contribution is then stored within 
and type of relation. 
Currently, segmentation is accomplished prior to 
using our algorithm by executing Hearst's text tiler 
(Hearst, 1994). The sentence numbers of each seg- 
ment boundary are stored for use by our algorithm. 
These sentence numbers are used in conjunction 
with relation type as keys into a table of potential 
scores. Table 1 denotes sample metrics tuned to sim- 
ulate the system devised by Barzilay and Elhadad 
(1997). 
At this point, the collection of "meta-chains" con- 
talns all possible interpretations of the source doc- 
ument. The problem is that in our final represen- 
tation, each word instance can exist in only one 
chain. To figure out which chain is the correct one, 
each word is examined.using the score contribution 
stored in Step 1 to determine which chain the given 
word instance contributes to most. By deleting the 
word instance from all the other chains, a represen- 
tation where each word instance exists in precisely 
one chain remains. Consequently, the sum of the 
scores of all the chains is maximal. This method is 
analogous to finding a maximal spanning tree in a 
graph of noun senses. These noun senses are all of 
the senses of each noun instance in the document. 
From this representation, the highest scored 
chains correspond to the important concepts in the 
original document. These important concepts can 
be used to generate a summary from the source 
text. Barzilay and Elhadad use the notion of strong 
chains (i.e., chains whose scores are in excess of two 
standard deviations above the mean of all scores) to 
determine which chains to include in a summary. 
Our system can use this method, as well as sev- 
eral other methods including percentage compres- 
sion and number of sentences. 
For a more detailed description of our algorithm 
please consult our previous work (Silber and McCoy, 
2000). 
2.4 Runtime Analysis 
In this analysis, we will not consider the computa- 
tional complexity of part of speech tagging, as that is 
not the focus of this research. Also, because the size 
269 
Worst Average 
Case Case 
C1 =No. of senses 30 2 
C2 =Parent/child isa relations ,45147 t4 
Ca =No. of nouns in WordNet 94474 94474 
C4 =No. of synsets in WordNet 66025 66025 
C5 =No. of siblings 397 39 
C6 =Chains word can belong to 45474 55 
Table 2: Constants from WordNet 
and structure of WordNet does not change from ex- 
ecution to execution of.aJae.algorit, hm, we shall take 
these aspects of WordNet to be constant. We will 
examine each phase of our algorithm to show that 
the extraction of these lexical chains can indeed be 
done in linear time. For this analysis, we define con- 
stants from WordNet 1.6 as denoted in Table 2. 
Extracting information from WordNet entails 
looking up each noun and extracting all synset, Hy- 
ponym/Hypernym, and sibling information. The 
runtime of these lookups over the entire document 
is: 
n * (log(Ca) + Cl * C2 + Cl * C5) 
When building the graph of all possible chains, we 
simply insert the word into all chains where a rela- 
tion exists, which is clearly bounded by a constant 
(C6). The only consideration is the computation 
of the chain score. Since we store paragraph num- 
bers represented within the chain as well as segment 
boundaries, we can quickly determine whether the 
relations are intra-paragraph, intra-segment, or ad- 
jacent segment. We then look up the appropriate 
score contribution from the table of metrics. There- 
fore, computing the score contribution of a given 
word is constant. The runtime of building the graph 
of all possible chains is: 
n*C6.5 
Finding the best chain is equally efficient. For 
each word, each chain to which it belongs is exam- 
ined. Then, the word is marked as deleted from 
all but the single chain whose score the word con- 
tributes to most. In the case of a tie, the lower sense 
nmnber from WordNet is used, since this denotes a 
more general concept. The runtime for this step is: 
n*C6.4 
This analysis gives an overall worst case runtime 
of: 
n * 1548216 + log(94474 ) + 227370 
and an average case runtime of: 
n • 326 + log(94474) + 275 
While the constants are quite large, the algorithm 
is clearly O(n) in the number of nouns in the original 
document. 
At "first glance, "the'constants ~involved seem pro- 
hibitively large. Upon further analysis, however, we 
see that most synsets have very few parent child re- 
lations. Thus the worst case values maynot reflect 
the actual performance of our application. In ad- 
dition, the synsets with many parent child relations 
tend to represent extremely general concepts. These 
synsets will most likely not appear very often as a 
direct synset for words appearing in a document. 
"2,;5 User ~Interface 
Our system currently can be used as a command 
line utility. The arguments allow the user to specify 
scoring metrics, summary length, and whether or 
not to search for collocations. Additionally, a web 
CGI interface has been added as a front end which 
allows a user to specify not just text documents, but 
html documents as well, and summarize them from 
the Internet. Finally, our system has been attached 
to a search engine. The search engine uses data from 
existing search engines on the Internet to download 
and summarize each page from the results. These 
summaries are then compiled and returned to the 
user on a single page. The final result is that a 
search results page is returned with automatically 
generated summaries. 
2.6 Comparison with Previous Work 
As mentioned above, this research is based on the 
work of Barzilay and Elhadad (1997) on lexical 
chains. Several differences exist between our method 
and theirs. First and foremost, the linear run-time 
of our algorithm allows documents to be summarized 
much faster. Our algorithm can summarize a 40,000 
word document in eleven seconds on a Sun SPARC 
Ultra10 Creator. By comparison, our first version 
of the algorithm which computed lexical chains by 
building every possible interpretation like Barzilay 
and Elhadad took sLx minutes to extract chains from 
5,000 word documents. 
The linear nature of our algorithm also has sev- 
eral other advantages. Since our algorithm is also 
linear in space requirements, we can consider all pos- 
sible chains. Barzilay and Elhadad had to prune in- 
terpretations (enid thus chains) which did not seem 
promising. Our algorithm does not require pruning 
of chains. 
Our algorithm also allows us to analyze the iin- 
portance of segmentation. Barzilay and Elhadad 
used segmentation to reduce the complexity of the 
problem of extracting chains. They basically built 
chains within a segment and combined these chains 
later when chains across segment boundaries shared 
a word in the same sense in common. While we in- 
clude segmentation information in our algorithm, it 
270 
is merely because it might prove useful in disam- 
biguating chains. The fact that we can use it or not 
allows our algorithm to test the importance of seg- 
mentation to proper-word ~ense disambiguation. It 
is important to note that on short documents, like 
those analyzed by Barzilay and Elhadad, segmen- 
tation appears to have little effect. There is some 
linguistic justification for this fact. Segmentation 
is generally computed using word frequencies, and 
our lexical chains algorithm generally captures the 
same type of information. On longer documents, 
our research has shown segmentation to have a much 
greater effect. 
3 Current Research and Future 
Directions 
Some issues which are not currently addressed by 
this research are proper name disambiguation and 
anaphora resolution. Further, while we attempt to 
locate two-word collocations using WordNet, a more 
robust collocation extraction technique is warranted. 
One of the goals of this research is to eventually 
create a system which generates natural  
summaries. Currently, the system uses sentence se- 
lection as its method of generation. It is our con- 
tention that regardless of how well an algorithm for 
extracting sentences may be, it cannot possibly cre- 
ate quality summaries. It seems obvious that sen- 
tence selection will not create fluent, coherent text. 
Further, our research shows that completeness is a 
problem. Because information extraction is only at 
the sentence boundary, information which may be 
very important may be left out if a highly com- 
pressed summary is required. 
Our current research is examining methods of us- 
ing all of the important sentences determined by our 
lexical chains algorithm as a basis for a generation 
system. Our intent is to use the lexical chains algo- 
rithm to determine what to summarize, and then a 
more classical generation system to present the in- 
formation as coherent text. The goal is to combine 
and condense all significant information pertaining 
to a given concept which can then be used in gener- 
ation. 
4 Conclusions 
\Ve have described a domain independent summa- 
rization engine which allows for efficient summariza- 
tion of large documents. The algorithm described is 
clearly O(n) in the number of nouns in the original 
document. 
In their research, Barzilay and Elhadad showed 
that lexieal chains could be an effective tool for 
automatic text summarization (Barzilay and EI- 
hadad, 1997). By developing a linear time al- 
gorithm to compute these chains, we have pro- 
dueed a front end to a summarization system which 
can be implemented efficiently. An operational 
sample of this demo is available on the web at 
http://www.eecis.udel.edu/- silber/research.htm. 
..... While. ,usable currenlfly, the-system provides a 
platform for generation research on automatic text 
summarization by providing an intermediate repre- 
sentation which has been shown to capture impor- 
tant concepts from the source text (Barzilay and 
Elhadad, 1997). The algorithm's speed and effec- 
tiveness allows research into summarization of larger 
documents. Moreover, its domain independence al- 
lows for research into the inherent differences be- 
tween domains. 
5 Acknowledgements 
The authors wish to thank the Korean Government, 
Ministry of Science and Technology, whose funding, 
as part of the Bilingual Internet Search Machine 
Project, has made this research possible. Addition- 
ally, special thanks to Michael Elhadad and Regina 
Barzilay for their advice, and for generously making 
their data and results available. 

References 
Regina Barzilay and Michael Elhadad. 1997. Us- 
ing lexical chains for text summarization. Pro- 
ceedings of the Intelligent Scalable Text Summa- 
rization Workshop, A CL Madrid. 
Michael Halliday and Ruqaiya Hasan. 1976. Cohe- 
sion in English. Longman, London. 
Marti A. Hearst. 1994. Multi-paragraph segmenta- 
tion of expository text. Proceedings o\] the 32nd 
Annual Meeting of the ACL. 
Gramme Hirst and David St-Onge. 1997. Lexical 
chains as representation of context for the detec- 
tion and correction of malapropisms. Wordnet: 
An electronic lexical database and some of its ap- 
plications. 
J. Morris and G. Hirst. 1991. Lexical cohesion com- 
puted by thesanral relations an an indecator of 
the structure of text. Computational Linguistics, 
18:21--45. 
H. Gregory Silber and Kathleen F. McCoy. 2000. 
Efficient text summarization using lexical chains. 
Conference on bztelligent User b~terfaces 2000. 
