Using Lexical Chains for Text Summarization 
Regina Barzilay 
Mathematics and Computer S~nence Dept 
Ben Gunon University m the Negev 
Beer-Sheva, 84105 Israel 
regana@cs.bEu ac. ~1 
Michael Elhadad 
Mathemat~s and Computer Saence Dept 
Ben Gunon Umveraty m the Negev 
Beer-Sheva, 84105 Israel 
http //mr¢ cs.bgu ac.xl/ elhadad 
Abstract 
We investigate one techmque to produce a summary 
of an original text without requmng zts full seman- 
ttc interpretation, but instead relying on a model of 
the topic progresston m the text derived from lex- 
lcal chains We present a new algonthm to com- 
pute lexlcal chains m a text, merging several robust 
knowledge sources the WordNet thesaurus, a part- 
of-speech tagger and shallow parser for the identifi- 
cation of nominal groups, and a segmentatton algo- 
rithm dernved from (Hearst, 1994) Summarization 
proceeds m three steps the ongmal text is first seg- 
mented, lexxcal chmns are constructed, strong chains 
are ldsnhfied and ssgnzflcant sentences are extracted 
from the text We present m tins paper empirical 
results on the tdent~catlon of strong chains and of 
slgmfieant sentences 
Introduction 
Summarization ts the process of condensing a source 
text into a shorter Version preserving its reformation 
content It can serve several goals -- from survey 
analysis of a sctenttfic field to qmck mchcatzve notes 
on the general toplc of a text Producing a quahty 
reformative summary of an arbitrary text remams 
a challenge winch reqmres full understanding of the 
text Indtcattves, lm~artes, winch can be used to 
qmckly decide whether a text is worth reading, are 
naturally easter to produce In tins paper we investi- 
gate a method for the production of such mdxcatlve 
summaries from arintrary text 
(Jones, 1993) descnbes summarization as a two- 
step process (1) Building from the source text a 
source representatton, (2) Summary generation- 
fonmng summary representation from the source 
representation bmlt m the first step and synthesismg 
the output summary text 
Within this framework, the relevantquestion is 
what reformation has to be included m the source 
representation m order to create a summary There 
are three types of source text reformation hngms- 
tlc, domain and commumcatlve Each of these text 
aspects can be chosen as a barns for source represen- 
tatlon 
Summaries can be bmlt on a deep semantic anal= 
ysis of the source text For example, (McKcown and 
Radsv, !905)investigate ways to produce a coher- 
ent summary of several texts describing the same 
event, when a detaded semantic representation of 
the source texts m available (m their case, they use 
MUC-style systems to interpret the source texts) 
Alternatzvely, early summarisatzon 
systems (Luhn, 1968) used only hngumtlc source m- 
formation The mtmtlon was that the moat frequent 
words represent the tmportant concepts of the text 
In this approach the source representation was the 
frequency table of text words Tins representation 
abstracts the text into the umon of its words w~thout 
conmdermg any connectlon among them 
In contrast to these two extreme pcsltlous (using 
as a source representation a full semantic representa- 
tion of the text or reducing ltto a simple frequency 
table), we deal m tins paper wttb the issue of pro- 
ducmg a summary from an arbitrary text without re- 
qmrmg zts full understanding, but using wtdely avad- 
able knowledge sources Our mare goal is therefore 
to find a middle ground for source representation, 
rich enough to braid quality indicative summaries, 
but easy enough to extract from the source text to 
work on arbltrary text 
Over-slmphficatlon can harm the quahty of the 
source representation As a trivial illustration, con- 
sider the following two sequences 
• 1 "Dr Kenny has sn~ented an anesthetsc maehsne 
Thss devwe controls the rate at wh:ch an ana- 
esthctsc ss pumped into the blood" 
2 "Dr Kenny has :nvented an anesthet:c machsne 
The Doctor spent two years on thu research" 
~Dr Kenny ~ appears once m both sequences and 
I0 
I 
I 
I 
I 
i 
I 
II 
II 
I 
! 
so does ~nach:n¢ ~ But sequence 1 ts about the roa- 
ch:he, and sequence 2 m about the *doctor ~ Tlus 
example mchcates that zf the source representation 
does not supply mformatlon about semantically re- 
lated terms, one cannot capture the %boutnesg' of 
the text, and therefore the summary will not capture 
the mare point of the original text 
The norton of cohemon, introduced m (Halhday 
and Hasan, 1976) captures part of the mtmtmn Co- 
hereon is a dewce for "sticking together" different 
parts of the text Cohesion m achmved through the 
use of semantmaUy related terms, reference, elhpsm 
and conjunctlous 
Among these dtfferent means, the most easdy zde- 
ntfllable and the most frequent type m lemcal cohe- 
" slon (as discussed m (Hoey, ~ 1991)) Lexlcal cohe- 
sion is created by usmg semantically related words 
Halhday and Hasan classflled lemcal cohesion into 
relteratlon category and collocatlon category Rezt- 
eratlon can be achmved by repetltlon, synonyms and 
hyponyms Collocatmn relatzons spectfy the relation 
between words that tend to co-occur m the same lex- 
zeal contexts (e g, "She works as a teacher m the 
.School ~) 
Collocation relations are more problematzc for ld- 
enttticat~on than rezterat|on, but both of t~hese cat- 
egones are Identifiable on the surface oi~ the text 
Lextcal cohemon occurs not only between two terms, 
but among sequences of related words ~ called/ez- 
~cal chains (Morns and Hlrst, 1991) Lemcal chains 
provide a representahon of the lemcal cohemve struc- 
tare of the text Lemcal chains have also been used 
for mfo~nahon retrieval (Stamnand, 1996) and for 
correction ofmalaproptsms (Htrst and St-Onge, 1997 
(to appear)) In tlus paper, we mveshgate how lem- 
cal chmns can be used as a source representation for 
summarization 
Another nnportant dunenmon of the lmgumtzc str- 
ucture of a source ,text m captured under the re- 
lated not,on of coherence Coherence defines the 
macro-level semantic structure of a connected dLs- 
course, while cohesion creates connectedness m a 
non-structural manner Coherence m represented m 
terms of coherence relat~ous between text segments, 
such as cla~orahon, cause and ezplanat|on Some 
researchers, e g, (Ono, Kazuo, and Seljl, 1994), 
use chscourse structure (encoded umng RST (MAnn 
and Thompson, 1987) as a source representatxon for 
summanzatxon) Clearly, thin representation ms ex- 
presmve enough, the question m whether ~t m com- 
putable In contrast to lemcal cohemon, coherence 
m chfl~cult to zdent|fy mthout complete understand- 
mg of the text and complex reference In ad&tton, 
there m no prease criteria for clasmficat~on of differ- 
ent relatlous Consider the following example from 
Hobbs(1978) "John can open the safe He Imows 
the combmahon " 
(Morns and H~mt, 1991) show that the relation 
between these two sentences can he interpreted as 
daborahon or as ezplanahon, depen&ng on %on- 
text, knowledge and behefs" 
There m, however, a close connechon between din- 
course structure and cohemon Related words tend 
to co-occur mthm a dmcourse umt of the text So 
cohemon m one of the surface mgns of dmcourse struc- 
ture and lexlcal chaln~ can' be used to Identify it 
Other mgns can be used to ldentzfy dmcourse struc- 
ture as well (connect,yes, paragraph markers, tense 
shifts) 
In thls paper, we investigate the use oflemcal 
chains as a model of the source text for the pur- 
pose of producing a summary Obviously, other 
pects of the source text need to be integrated m the 
text representation to produce quahty summaries, 
but we want to empmcally investigate how far one 
can go exploiting mainly lemcal chains In the rest 
of the paper we first present our algorithm for lex- 
zeal chain construct,on We then present empmcal 
results on the ldentlficatzon of strong chains among 
the posmble can&dates produced by our algorithm 
Finally, we describe how lexlcal chains are used to 
identify mgmficant sentences mtlnn the source text 
and eventually produce a surQmary 
Algorithm for Chain Computing 
One of the clnef advantages of lemcal cohesmn m 
that zt m an easdy reco~m~able relatmn, enabhng 
lexlcal chains computation The first computational 
model for lemcal chains was presented m (Morns and 
Hlrst, 1991) They define lexlcal cohesmn relatzons 
m terms of categories, index entries and pointers m 
Roget's Thesaurus Morns and Hlrst evaluated that 
their relatedness criterion covered over 90% of the 
mtmttve lexzcal relatzons Cham~ are created by tak- 
ing a new text word and findtng a related chain for 
it according to relatedness criteria Morns and HLrst 
introduce the notion of "actzvated chain ~ and ~cham 
returns", to take into account the dmtance between 
occurrences of related words They also analyze fac- 
tors contributing to the strength of a chain -- rep- 
etltxon, density and length Morns and Hn'st &d 
not ~nplement their algorithm, because there was 
no machine-readable vermon of Roget's Thesaurus 
at the tzme 
One of the drawbacks of thelr approach was that 
they chd not reqmre the same word to appear ruth 
the same sense m ~ts &ffexent occurrences for tt 
to belong to a chain For semantically ambiguous 
11 
words, this can lead to confnslous (e g, mixing two 
senses of taSle as aptece 0f furniture or an array) 
Note that choosing the appropriate chain for a word 
is eqmvalent to dzsamblguatmg tins word m context, 
which is a well-known d~fl~cult problem m text un- 
derstanding 
More recently, two algorithms for the calculation 
of lexlcal chains have been presented m Hirst and St- 
Onge (1995) and Stairmand (1996) Both of these 
algornthms use the WordNet lexlcal database for de- 
termining relatedness of the words (Miller et al, 
1990) Senses m the WordNet database are repre- 
sented relatlonally by synonym sets ('synsets') -- 
which are the sets of all the words sharing a com- 
mon sense For example two senses of "computer" 
are represented as {calculator, reckoner, figurer, es- 
timator, computer) (s e, a person who computes) 
and {computer, data processor, electromc computer, 
reformation processing system) WordNet contains 
more than 118,000 dflferent word forms Words of 
the same category are hnked through semantic rela- 
tions hke synonymy and hyponymy 
Polysemous words appear m more than one syn- 
sets (for example, comptdcr occurs m two synsets) 
Approxtmately 17% of the words m WordNet are 
polysemous But, as noted by Stairmand, this fig- 
ure is very tmsleadmg "a slguxficant proportion of 
WordNet nouns are Latin labels for biological en- 
titles, which by their nature are monosemons and 
our experience wtth the news-report texts we have 
processed ts that approxtmately half of the nouns 
encountered are polysemous" (Stairmand, 1996) 
Generally, a procedure for constructing lexlcal ch- 
ains follows three steps (1) Select a set of can&date 
words, (2) For each candldate word, find an appro- 
priate chain relying on a relatedness cute.on among 
members of the chains, (3) If It is found, insert the 
word m the chain and update It accorchngly 
An example of such a procedure was represented 
by Hlrst and St-Onge (H&S) In the preprocessor 
step, all words that appear as a noun entry m Word- 
Net are chosen Relatedness of words xs dstermmed 
m terms of the distance between their occurrences 
and the shape of the path connecting them m the 
WordNet thesaurus Three kinds of relation are de- 
fined extra-strong (between a word and tts rep- 
etxt~on), strong (between two words connected by 
a Wordnet relatxon) and mechum-stroug when the 
hnk between the synsets of the words is longer than 
one (only paths satisfying certain restrictions are ac- 
cepted as vahd connectxons) 
The maxtmum distance between related words de- 
pends on the kind of relatxon for extra-strong rela- 
ttons, there is not hxmt m &stance, for strong rela- 
tlons, it is hmlted to a window of seven sentences, 
and for mechum-strong relations, It is wltinn three 
sentences back 
To find a chain m winch to insert a given can- 
dtdate word, extra-strong relattons are preferred to 
strong-relations and both of them are preferred to 
medmm-strong relations If a chain is found, then 
the candtdate word is inserted with the appropriate 
sense, and the senses of the other words m the receiv- 
ing chain are updated, so that every word connected 
to the new word m the chain relates to Its selected 
senses only If no chaan is found, then a new chain Is 
created and the can&date word ts inserted with all 
its possible senses m WordNet 
The greedy &samblguatzon strategy Implemented 
m this algorithm has some lmntatlonsdinstrated by 
the following example 
Mr. Kenny ~s the person that invented an anaesthehc 
machine whsch uses micro-computers to control the 
rate at whsch an anaesthehc ,s pumped into the blood 
Such machines are nothing new But hu device uses 
two micro-computers to achseee much closer momtor- 
mg o/the pump \]eedmg the anaesthehc into the pahent 
Accor&ug to H&S's algorithm, the chain for the 
word "Mr" is first created \[lex "Kr.", sense 
{mzster, Mr. }\] "Mr" belongs only to one synset, 
so it is chsamblguated from the beginning The word 
"person" is related to tins chain m the sense "a 
human be,ng" by a medmm-stroug relation, so the 
chain now contains two entries 
\[lex "Mr'.", sense {m.ster, Mr.)\] 
\[lex "person", sense {person, :t.nd~.v~dual, 
someone, man, mortal, huma.u, sou1}\] 
When the algorithm processes the word "machineD, 
It relates it to this cham, because "roach:hen m 
the first WordNet sense ("an e Oiczent person") is 
a holonym of apersonn m the chosen sense In other 
words, "machine" and "person" are related by a 
strong relation In tins case, "machine" ts disam- 
blguated m the wrong way, even though after tins 
first occurrence of "machine", there is strong evi- 
dence supporting the selechon of xts more common 
sense "macro-computer", "demce" and "pemp" all 
point to its correct sense m tins context ~ "any me- 
chanzcal or electrzcal devzce thaZ performs or assgs~s 
zn the performance" 
Tins example mdtcates that disamblguatlon can- 
not be a greedy decision In order to choose the right 
sense of the word the 'whole ptcture' of chain distn- 
butwn m the text must be conmdered We propose 
to develop a chaining model according to all possxble 
alternatives of word senses and then choose the best 
one among them 
Let us dlustrate tins method on the above exam- 
12 
I 
I 
I 
I 
II 
II 
II 
I 
I 
II 
I 
! 
II 
! 
pie First, a node for the word =Mr" Is created \[lex 
"ltr'.", sense {mister, Kr }3 The next candi- 
date word Is "person" It has two senses "haman 
besng n (person "1) and erratum=heal cafegory of 
pronouns and verb forms" (person -- 2) The choice 
of sense for ~person" sphts the chain world to two 
dflferent interpretations as shown m Figure 1 
I 
Figure I Step I 
lpen%} 
Interpretations 1 and 2 
We define a component as a list of interpretations 
that are exclusive of each other Component Words 
influence each other in the selection of their respec- 
tive senses 
The next candidate word =anaesthetsc" Is not re- 
lated to any word m the first component, so we cxe- 
ate a new component for it with a single lntexpreta- 
taon 
The word "machsne" has 5 senses mach:nei to 
machine5 In its first sense; "an e.0ic:ent person", 
it m related to the senses =person" and =Mr" It 
therefore influences the selection of thexr senses, thus 
"machine" has to be ~ m the first component 
After its msertmn the picture of the first component 
becomes the one shown m Figure 2 
• But ff we continue the process and insert the wor- 
ds =micro-compeer', = dcmce n and =pump', the nu- 
mber of nlternatlve greatly increases The strongest 
interpretations are given m Figures 3 and 4 
Under the assumption that the text Is cohessve, 
we define the best interpretation as the interpreta- 
tion with the most connections (edges m the graph) 
In tins case, the second interpretation at the end of 
Step 3 is selected, which predicts the right sense for 
"machine" We define the score of an interpretation 
as the sum of its chain scores Chain seore is deter- 
mined by the number and weight of the relations be- 
tween chain members Expenlnentally, we fixed the 
weight of reiteration and synonym to 10, of antonym 
to 7, and of hyperonym and holonym to 4' Our al- 
gorithm develops all possible interpretations, main- 
tainmg each one without self contradiction When 
the number of possible interpretations is larger than 
a certain threshold, we prune the weak interpreta- 
tions according to tins criteria In the end, we select 
from each component, the strongest interpretation 
(Mr .m| 
\[pe~ntlZlt mdtvtdttal mmeoae I 
I maclune a } S 
Step 2: Interpretauon 1 
tMr,numer} 
lpenm} 
{marina% machine s I" 
Step 2: Interpretation 2 
(pez-~ individual me, ) 
(~ehme a m~h,ne s ) 
Step 2" InterpretaUon 3 
iMr.n~tef} 
\[permn} 
{n,aclane, I 
Step 2: Interpret=non 4 
FFtgure 2 Step 2 Interpretations I to 4 
.'~.-. . • 
In snmmary, our algorithm differs from H&S's al- 
gorithm m that It introduces, m addition to the re- 
latedness criterion for members~p to a chain, a non- 
greedy dzsainbiguatlon heuristic to select the appro- 
priate senses of chain members 
The two algonthms differ m two other major as- 
pects the criterion for the selection of candidate 
words and the operative defimhon of a text unit 
We choose as candidate words simple nouns and 
noun compounds As mentioned above, nouns are 
the main contributors to the =aboutness" of a text, 
and noun synsets dominate m WordNet Both 
(Stairmand, 1996) and H&S rely only on nouns as 
candidate words In our algorithm, we rely on the 
results of Brdl's part-of-speech tagging algorithm to 
idsntlfy nouns, whl\]e H&S do not go through this 
step and only select tokens that• happen to occur as 
nouns m WordNet 
In addition, we extend the set of candidate words 
to include noan compound We first empmcally eval- 
uated the unportance of noun compounds by taking 
mto account the noun compounds exphcttly present 
m WordNet (some 50,000 entries m WordNet are 
noun compounds such as "sea level" or co\].locatlons 
13 
(Mr,lms~e¢} ~('MLczq-~__'~ {PC, rmaro- computer, } 
t Iperso~ 
Figure 3 Step 3 Interpretation 1 
Figure 4 Step 3 Interpretation 2 
such as "digital computeff) However, Enghsh in- 
cludes a productive system of noun comp0hnds, and 
m each domain, new noun-compounds and colloca- 
tions not present m WordNet play a major role 
We addreseed the issue, by usmg a shallow parser 
(developed by Ido Dagan's team at Bar Ilan Um- 
verslty) to identify noun-compounds using a snnple 
characterization of noun sequences Tins has two 
major benefits (1) it ldentflles Important concepts 
m the domain (for example, m a text on "quan- 
tum computing", the mare token was the noun com- 
pound ``~uantum computing" winch was not present 
m WordNet), (2) it chromates words that occur as 
modn~ere as posmble can&dates for chain member- 
sinp For example, when ``quantum computing" m 
selected as a smgle umt, the word ``¢uantum ~ is not 
selected This Is beneficial because m tins example, 
the text was not about-"quantum', but more about 
computers When a noun compound ~s selected, the 
relatedness criterion in WordNet ~s used by couslder- 
mg its head noun only Thus, "quantum computer ~ 
~s related to ``machine ~ as a ~computer ~ 
The second dflfexence m our algorithm hes m 
the operative defuntion we gwe to the notion of 
text umt We use as text umts the segments ob- 
tained from Hearst's algorithm of text segmentation 
(Hearst, 1994) We braid chains m every segment 
according to relatedness criteria, and in a second 
stage, we merge chains from the dflferent segments 
using much stronger criteria for connectedness only 
two chains are merged across a segment boundary 
only if they contain a common word with the same 
sense Our mira-segment relatedness criterion.is less 
strict members of the same synsets are related, a 
node and its offspnng m the hyperonym graph are 
related, mbhngs m the hyperonym graph are related 
only ffthe length of the path m less thana threshold 
The relation between text segmentation and lex- 
lcal chain is dehcate, since they are both derived 
from partially common source of knowledge lexlcal 
&stnbutlon and repetitions In fact, lexlcal chains 
could serve as a barns for an algorithm for segmen- 
tation We have found empmcally, however, that 
Hearst's algorithm behaves well on the type of texts 
• we checked and that it prowdes effectively a sohd 
basLS for lexlcal chains construction 
Building Summaries Using 
Lexical Chains 
We now investigate how lexlcal chains can serve as 
a source representation of the original text to budd 
a summary The next question m how to build.sum- 
mary representation from tins source representation 
The most prevalent dmcourse topic will play an 
important role m the summary We first present 
the mtmtlon why lex~cal chains are a good m&cator 
of the central topic of a text G!ven an approprn- 
ate measure of strength, we show that picking the 
concepts represented by strong lexlcal chains glves a 
better mchcatlon of the central toplc of a text than 
snnply plckmg the most frequent words m the text 
(which forms the zero-hypothesis) 
For example, we show m Appendix a sample 
text about Bayeman Network technology There, the 
concept of network was represented by the words 
"network" with 6 occurrences, %ct" with 2, and 
``system ~ ruth 4 But the summary representa- 
tion has to reflect that all these words represent 
the same concept Otherwise, the summary gen- 
eration stage would extract information separately 
for each term The chain representation approach 
avmds completely this problem, because all tl~ese 
terms occur m the same chain, winch reflects that 
they represent the same concept 
An ad&tlonal argument for the chain representa- 
tion as opposed to a rumple word frequency model 
is the case when a tangle concept is represented by a 
number of words, each with relatively low fTequency 
In the same Bayesian Network sample text, the con- 
cept of "reformat:on" was represented by the words 
",nformatson" (3), "datum" (2), "Irnowledge" (3), 
"concept" (1) and "model" 1 In tins text, "mforma. 
tzon" m a more important concept than "computer" 
14 
I 
i 
I 
I 
I 
I 
i 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
whtfh occurs 4 times Because the "mformatson" 
chmn combines the number of occurrences of all its 
members, It can overcome the weight of the single 
word "computer" 
Scoring Chains 
In order to use leemcal chains as outlined above, one 
must first identify the strongest chains among all 
those that are produced by our algorithm As is 
frequent m summarization, there Is no formal way 
to evaluate chain strength (as there m no formal 
method to evaluate a summary quality) We there- 
fore rely on an empmcal methodology We have 
developed an envxronment to compute and graph- 
lcally visuallze lexxcal chains to evaluate experimen- 
tally how they capture the mare topics of the texts 
Figure 5 shows how lemcal chains are visualized to 
help human testers evaluate therr importance 
Figure 5 Visual representa~on of lexlcal chRm~ 
We have collected data for a set of 30 texts 
extracted from popular magazmes (from "The 
Econommt" and ``Scientific American"), all of them 
are popular science genre For each text, we manu- 
ally ranked chains m terms of relevance to the mare 
toplcs We then computed different formal measures 
on the chmns, including chmn length, ¢hstnbution 
m the text, text span covered by the chain, density, 
graph topology (diameter of the graph of the words) 
and number of repetitious The results on our data 
set indicate that only the following parameters are 
good predictors of the strength of a chmn 
Length: The number of occurrences of members of 
the chain 
Homogeneity index: 1 - the number of distract 
occurrences divided by the length 
We demgned a score function for chains as 
Score(Chain) = Length • Homogene=ty 
When ranking CbamR according to thexr score, we 
evaluated that strong chamR are those winch satlsfy 
our "Strength Criterion" 
5core(Cha:n) > Auerage(Seores) + 
2 . ~tandardDeemtson( Scorea) 
These are prehmmary results but they are con- 
firmed by our experience on 30 texts analyzed ex- 
tensively We have expertraenteed wsth d~erent nor- 
mahzation methods for the score function, but they 
do not seem to nnprove the results We plan on 
extending the empmcal analym m the future and 
to use formal learmng methods to determine a good 
scoring function 
The average number of strong chains selected by 
thxs selection method was 5 for texts of 1055 words 
on average (474 words mmunum, 3198 words mare- 
mum), when 32 chmnR were originally generated on 
average The strongest chmn of the sample text are 
represented m Appendix 
Extracting Significant Sentences 
Once strong chains have been selected, the next step 
of the summarization algorithm is to extract full sen- 
tences from the original text based on chain distn- . 
butlon 
We investigated three alternatives for tlus Step 
Heuristic 1 For each chain m the summary rep- 
resentation choose the sentence that contains the 
first appearance of a chain member m the text 
Thls heuristic produced the following summary for 
the text shown in Appendix 
When Mscroaoft Semor Vsce Pressdcnt Steve Ballmer 
first heard h~ company was planning to make a huge m- 
vestment m an Internet oermec offering mome remews 
and local entertainment mformahon m ma3or cstwzs 
across the nahon, he trent to Cfiasrman Bdl Gates wlth 
hu concerns M.crasoft's compehhve advantage, he re- 
sponded, tvas its exparhse m Bayesian networks 
Bayessan nettvorks an~ cort~pl~ diagraras that o~gamze 
the body of knowledge m any gwen area by mapping out 
cause and effect relatmnshlpa among key varmbl~ and 
encoding them ¢vsth numbers that repr~ent the eztent to 
tvhsch one varmble ss hkely to a~ect another 
Programmed into computers, these systems can auto. 
mahcally generate optimal pred, chon8 or decisions even 
tohen key pieces of mformahon are mtsslng 
When Mserosoft tn 1993 hired Eric Horustz, David Heck- 
erman and Jack Brecse, pioneers m the development of 
Bayesmn systems, colleague8 m the field were surprised 
The problem wxth tins approach m that all words 
m a chain reflect the same concept, but to a &fferent 
extent For example, m the AI chain, (AppendL~, 
Chain 3) the token %czence" ts related to the con- 
cept aA~', but the words ``AF' and ``)~eid" are more 
suitable to represent the mare topic ``AI" m the con- 
text of the text That is, not all chain members are 
good representatives of the topic (even though they 
all contribute to its meamng) 
15 
I 
We therefore defined a criterion to evaluate the 
approprlateness of a cham member to represent, its 
chain based on its frequency of occurrence m the 
chani ~ ~We found experimentally that such words, 
call them represenfafs~e words, have a frequency m 
the chain no less than the average word frequency 
m the chain For example, m the third chain the 
representative words are "field" and "AI" 
Heuristic 2 We therefore defined a second heu- 
ristic based on the notion of representative words 
For each chain m the summary representation, 
choose the sentence that contains the first appear- 
ance of a representative chain member m the text 
In this special case this heuristic gives the same 
result as the first one 
Heuristic 3 Often, the same topic is dmcussed 
in a number of places in the text, so its chain is 
dL~tnbuted across the whole text Still, m some text 
unit, this global topic is the central topic (focus) of 
the segment We try to identify this umt and extract 
sentences related to the topic from this segment (or 
successive segments) only 
We characterize this text umt as a cluster of suc- 
cessive segments with high density of chain mere- 
beers Our tlnrd heuristic Is based on thts approach 
For each chain, find the text umt where the chain 
Is highly concentrated Extract the sentence with 
the first chain appearance m tins central umt Con- 
centratlon m computed as the number of chain mem- 
bers occurrences m a segment &vlded by the number 
of nouns m the segment A chain has high concen- 
tratton ff its concentrat|on is the mammum of all 
chains Cluster is grou p of successive segments such 
that every segment contains chain members 
Note that m all these three techmques only one 
sentence is extracted for each chain (regardless of 
its strength) 
For most texts we tested, the first and second tech- 
niques produce the same results, but when they are 
dflferent, the output of the second teclmlque Is bet- 
tex Generally, the second techmque produces the 
best summary We checked these methods on our 
30 texts data set Surprisingly, the tlnrd heuris- 
tic, winch intuition predicts as the most sophisti- 
cated, gives the least indicative results TIns may 
be due to several factors our criteria for 'cen- 
trahty' or 'clustering' may be insufficient or, more 
hkely, the problem seems to be related to the in- 
teraction with text structure The third heuristics 
tends to extract sentences from the middle of the 
text and to extract several sentences from dmtant 
places m the text for a single chain The complete 
results of our experiments are avatlable onohne at 
htl;p://~ cs bgu. ac 3.1/sllmm,a.r~.za¢3.on-tesl; 
Limitations and Future Work 
We have identified the following maul problems with 
our. method 
Sentence granularity all our methods extract 
whole sentences as single umts Ttus has several 
drawbacks long sentences have mgnflicantly ln- 
gher hkehhood to be selected, they also include 
many constituents which would not have been 
selected on theu own merit The alternative 
Is extremely costly it revolves some parsing of 
the sentences, the extraction of only the central 
constituents from the source text and the regen- 
eration of a summary text using text generation 
techniques 
Extracted sentences contain anaphera hnks to 
the rest of the text This has been investigated 
and observed by (Black, 1994) Several heurls- 
ties have been proposed m the hterature to ad- 
dress flus problem (Pmce, 1990), (Patce and 
Husk, 1991) and (Black, 1994) The strongest 
seems to be to include together wtth the ex- 
tracted sentence the one lmme&ately precechng 
it Unfortunately, when we select the first sen- 
tence in a segment, the preceding sentence does 
not belong to the paragraph and its insertion 
has a detrimental effect on the overall coherence 
of the summary A preferable solution would 
be to replace anaphora wzth theLr referent, but 
again fins m an extremely costly solution 
Our method does not provide any way to control 
the length and level of detad of the summary 
In all of the methods, we extract one sentence 
for each chain The number of strong chamR re- 
mmns smaU (around 5 or 6 for the texts we have 
tested, regardless of then length), and the re- 
mmmng chains would introduce too much nmse 
to be of interest m ad&ng details The best so- 
lution seems to be to extract more material for 
the strongest chains 
The method presented m thin paper m obviously 
partial mthat it only considers lemcal chains as a 
source representation, and ignores any other clues 
that could be gathered from the text Still, our 
first mformalevaluatlon indicates our results are of a 
quahty superior to that of summarizers usually em- 
ployed m commercial systems such as search systems 
on the World Wide Web on the texts we investigated 
A large-scale evaluation of the method and how sen- 
sltlve It IS to the quahty of the thesaurus and to its 
parameters is under way 
16 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
! 
! 
I 
I 
I 
I 
Bayesian Networks Text 
When 'M~rosoft Semor VICU P~esKhmt Steve BalJm~ fn'st heard has company yves 
planning to m eke a huge mveatment In an Interoet eennca offenng mowe reins end 
Ioca| entu~tainment mfonlnntmn m major cmea ao~ea the no•tun he went to Chanmun 
Bd\] Gates wth his concern• 
After ell Bellme~ has bflhom of dollars of lus own money m M~'osoft stock, and 
~tertumment tsn t exactly the compuny'e strong point 
Out Gates dmrmesed such re•alva•runs M~:rosoft's compehUve advantage, he re- 
sponded, me mt expertem In Bayeamn netw0cke 
Asked recently when computers mum fnaagy beipn to understand human speech 
Gates begun dncussmgthe r.ntal role of Bay•stun' systems 
Ask any other software executive about anything Bay~mn and you're hable to set • 
blunk •tare 
I• Gates onto •orneUung? I• tim abm4oundmg technology Mmrosoft • new secret 
wenpoN ? 
Bayeslun netvmrk• ere complex dmgram• that organize the body of knowledge m any 
Ipven area by rnapping out cause-and-effect relat~onshnps among key vamblea end 
encoding them with numbees that represent the extent to which one ramble n 5kth7 
to affect another ( ) 
Programmed into comlmt4m~, these •ystems can autometzcuHy generate optunzd pre- 
dlct4ons or deastons even when key pieces of Informabon are m~n g 
When Miorozoft in 1993 hired Eric Homtz, David H~:kermun and Jack Brine pro- 
nears m the devdopment of Bay•stun systems, colleagues m the field were •mlmsed 
The held was still an obscure, lergdy academic enterlmea 
Bayesian nets prowde an oeararchmggraphtcal framework" that Imngs togetlm'cb- 
verse elementsof AI and Increase'the range of Ks hkely epphcabon to the real world 
says Michael Jordon profes~mr of bram'and cog•rove menea et the Massachusetts 
Institute of TechnoloW 
Mmrosoft ts unquestionably the most aKfFessJve in expkntmg the new epproach The 
cumpeny offers • 'free Web sen•tee that helps customers dtegnose pnntmg problems 
ruth their cornputae and recommends the qmdoett way to resolve them Another 
Web se~nca help• parents diagnose thew chlklren'e health problems ( ) 
Horm~ who with two colleagues founded Knowledse Induutnea to develop tools for 
developing Ba.~slan systems says he and the others left the eompany tojom Microsoft 
In part because they wanted to see their theoretu:ol work more broadly apphed 
Although the corn pany did important work for the Natmnal Aeronautics and Space Ad- 
mimstmbon and on medw.al diagnostics Homtz says 'it • not Ilk• your grundmothef 
vnll use it 
Miorosoft'• eehvmea m the held am now helping to build a groundsv.~dl of support for 
Bayesian Kleas 
People look up So M~osoft ' says Pearl. who wrote one of the key early texts on 
0aye•on networks tn 1988 and has become an unoflu:~d •pokesrcan for the hdd 
"They ve Ipven a boost to the whole area" 
M~'osoft m wodtm s on technques that wdl enable the Bayeamn networks to ke~rn 
or update them•dyes automatu:;dly based on new knowledge 8 task that m currently 
cumbersome 
Bayesian Network Text: the 
Strongest Chain 
The Criterion t~ 3 ~8, here are the five strong chasn~ 
OHAIN I Score = 14 0 
mzcro~oft 10 concern I company 6 
enterta~tment-~ervlce 1 enterprbe 1 
ma~ttchu~et t e-mats• ut e 1 
~HA/,N £ Score  ffi 9 0 
ba~'e~l&n-~y~tem 2 ~y~tem 2 baye~za~s-net 2 
network 1 baye~z~n-network 5 weapon 1 : 
CHAIN 3 Score ---- 7 0 
m 2 a~ttficzal-mtolhgunce /~ 
field 7 technology 1 t, czence I 
CHAIN ~ Score  ffi 6 O 
tochmquo 1 b&ye~tsn-techmque I condztzon I 
datum 2 model I mformatton 3 area I 
knowledge 3 
~HAIN S Score = 3 0 
computer 4 
Acknowledgements 
Tkts work has been supported by the Israeh Mlmstry 
of Science We axe grateful to Graeme Ktrst, Dragonnr 
Radev and Claude Bneson for thezr feedback on a previ- 
ous vermon 

References 
Black, Wflham J 1994 Parsing, lmgmstlc resources and 
semantic analysis, for abstzactmg and categorization 
Halhday, lqhchael and Ruqatya Hasan 1976 Cohesion 
in Enghsh Longman, London 

Hearst, Marti A 1994 Multi-paragraph segmentation 
of exposltoZT text In Proceedmgs of the 3~nd Annual 
Meetmg of the Assocmhon for Computational Lmguts- 
hcs 

HLmt, Graeme and Dared St-Onge 1997 (to appear) 
Lemca\] chains as representation of context for the de- 
tection and correction of malapropisms In Chns- 
tiane Fellbanm, edxtor, WordNet An electmmc lez- 
tcal database and some of ,ts appheat:ons Cambridge, 
MA The MIT Press 

Hoey, M 1991 Patterns of Leats m Tezt Oxford Um- 
vermty Press, Oxford 

Jones,.KarenSpaxck 1993 What mightbe m summary ? 
Informahon Retrleval 

Lulm, H P 1968 The automatxc ereatzon of hterature 
abstracts In Schultz, edttor, H P Luhn Pioneer of 
lnformahon Science Spartan 

Mann, W C and S Thompson 1987 RhetoncaJ. struc- 
ture theory description and constructions of text 
structures In Gerard Kempen, echtor, Natural Lan- 
guage Generahon New Results m Arhfictal Intellh- 
gence, Psychology and Lmgutst:cs Maxtmus Nmjhot~ 
Pubhshers, pages 85-96 

McKeown, Kathleen and Dragonur Radev 1995 Gen- 
eratmg summaries of multiple news articles In SIGIR 
95 Proceedings 

Mdler, George A, P,~chard Beckwlth, Chnstiane Fell- 
bans, Derek Gross, and Kathenne J ~ 1990 
Introduction to WordNet An on-lme lexxcal database 
Internat,onal Journal of Lazwcographg (special issue), 
3(4) 23,5,-312 

Morns, J and G Hzrst 1991 Lexzcal coheszon com- 
puted by thesanra\] relations as an re&cater of the 
structure of the text Computahonal Lmgutsttos, 
17(1) pp 21.--45 

Onq, Ken3h Sunuta Ks•no, and Mnke Seql 1994 Ab- 
stract genezation based on rhetorical structure extruc- 
tzon In Proceedings of the International Conference 
on Computahonal Lmgutshcs (Cohng 9~), pages 344-- 
348, Japan 

Pa~ce, C D and G D Husk 1991 Towards the au- 
tomatic recognntlon of anaphonc features m enghsh 
text The nnpexsonal pronoun "zt ~ Computer Speech 
and Language, (2) pp 109-132 

Patce, Chris D 1990 Constructing hterature abstracts 
by computer techmqu~ and prospects lnformahon 
Processmg and Management, 26(1) 171-186 

Statrmand, Mark A 1996 A Computahonal Analysts of 
Lexscal Cohesion wtth Apphcattons :n Informatton Re- 
trievai Ph D thesis, Center for Computational Lm- 
gmstics, UMIST, Manchester 
