A Formal Model of Text Summarization Based on 
Condensation Operators of a Terminological Logic 
Ulrich Reimer Udo Hahn 
Swiss Life Frelburg Unlverslty 
Information Systems Research Group Computational Llngmstlcs Group (CLIF) 
CH-8022 Zurich, Switzerland D-79085 Freiburg, Germany 
relmer@swlsslLfe ch hahn@collng unl-frelburg de 
Abstract 
We present an approachto text summa- 
nzatlon that m entirely rooted m the formal 
descnptlon of a classtficatmn-based model 
of termmologlcal knowledge representahon 
and reasoning Text summarization m con- 
mdered an operator-based transformation 
process by which knowledge representation 
structures, as generated by the text under- 
stander, are mapped to conceptually con- 
densed representahon structures forming a 
text summary at the representation level 
. The framework we propose offers a variety 
of subtle parameters on which scalable text 
summarlzahon can be based 
1 Introduction 
From its very begmmng, the development of text 
understanding systems has been mhmately trod to 
the field of knowledge representahon and reasoning 
methods (Schank ~ Abelson 77) ThLs close rela- 
tionship was justtfied by the observation that any 
adequate form of text understanding not only re- 
qmres grammatical knowledge about the particular 
language, but also, among others, has to incorporate 
knowledge about the dommn the text deals with 
Thus, the referencing capabdltms of knowledge rep- 
resentation languages were conmdered crucial for any 
adequate design of text understanding systems 
Out of thLs tradlhon a series of knowledge-based 
text summarizahon systems evolved, the method- 
ology of whlch was almost exclnslvely based on. 
the Schanklan-type of Conceptual Dependency (CD) 
representations (e g, (Culhngford 78, Lehnert 81, 
DeJong 82, Dyer. 83, Trot 85, Alterman 86)) CD 
representations, however, are formally underspecl- 
fled representation devices lacking any serious formal 
foundatlon According to thin, the summanzahon 
operatlons these first-generatlon systems provide use 
only informal heurlshcs to determine the sahent top- 
Ics from the text representahon structures for the 
purpose of summanzatlon A second generahon of 
summarlzahon systems then adapted a more ma- 
ture knowledge representahon approach, one based 
on the evolvlng methodolo~cM framework of hy- 
brid, dasslficatlon-based knowledge representahon 
languages (cf (Woods & Schmolze 92) for a survey) 
Among these systems count SUSY (Fum et al 85), 
SCISOR (Ran 87), and TOPIC (Rennet & Hahn 
88), but even m these frameworks no attempt was 
made to properly integrate the text summarlzahon 
processmto the formal reasomag mechanmms of the 
underlying knowledge representahon language 
Thin m where our interest comes in We propose 
here. a model of text summarlzatlon that m entirely 
embedded m the framework of a clasmficatlon-based 
model of termmologlcal reasoning Text summa- 
rlzahon m conmdered a formally gulded transfor- 
mation process on knowledge representahon struc- 
tures, the so-called text knowledge base, as derived 
by a natural language text parser The transforma- 
hons revolved inherit the formal rigor of the underly- 
Ing knowledge representatlon model, as correspond- 
mg summarlzahon operators bmld on that model 
Thus, our work describes a methodologlcally coher- 
ent, representahon-theory-based approach to text 
summarlzahon that has been lacking m the htera- 
ture so far (for a survey cf (Hutchms 87)) Aside 
from these purely representahonal conslderahons, 
the terminological reasoning framework for the sum- 
manzatlon model we propose offers a variety of sub- 
tle parameters on whlch scalable summarization pro- 
cesses can be based Thin contrasts, m particular, 
wlth those approaches to text summanzahon whlch 
almost entlrely rely upon bmlt-m features of frame 
and scrlpt-based representatlons and, consequently, 
97 
provide rather mmpie reduction heunstxcs m order to 
produce text summarms (e g, (DeJong 82, Young 
Hayes 85)) The formal model we present has been 
tested m TOPIC (Re!met & Hahn 88), a text sum- 
manzat~an system Which has been apphed to expos- 
~tory texts m the domain of computer eqmpment as 
well as to various kinds of texts dealing with legal 
lssUes (company regulations, adwsory texts, etc ) 
This paper m organized as follows In Section 2 we 
lay down a description of the syntax and semantics 
of the terminological logic which serves as the formal 
backbone for the specification of condensation oper- 
ators on (text) knowledge bases From thin formal 
descnptmn we then turn to the formal model of text 
summarization m Section 3 
2 The Terminological Knowledge 
Representation Model 
In the following, we describe a subset of a termi- 
nological logic (for an introduction to ~ts underlying 
basic notatlonal conventions, cf (Woods & Schmolze 
92)) Sectmn 2 1 considers the terminological com- 
ponent, whde Section 2 2 deals with appropriate ex- 
tensions for representing text-specific knowledge 
2.1 .The Basic Terminological Component 
We dmtmgmsh two kinds of relations, namely prop- 
erttes and conceptual relationships A property de- 
notes a relation between individuals and string or 
integer values A conceptual relatsonshsp denotes a 
relation between two mchv~duals The concept de- 
scription language prowdes constructs to formulate 
necessary (and possibly sufllcmnt) conditions on the 
properties and conceptual relationships every ele- 
ment of a concept class m reqmred to have The 
syntax of thin language m given m Fig 1 
Oe,m,~oto~) = (~onc-,.~o)" 
(co~ ,~tro) = (co.~ .am~) < (~'~Pd (c-ezpr) 
---- (and (c-ezpr)++) l (conc-name) \[ 
(all-p (prop-name) (prop-range)) \[ 
(all-r (rel-name) (conc-name) +) I " 
(exlst-v (prop-name) (value)) \[ 
(exlst-c (rel-name ) .( conc-name ) ) 
(conc-~am~) = Odent~ f ~ed 
F~gure 1 Syntax of a Terminological Logic 
Every constructor m Fig 1 can be used to de- 
fine a concept class (cf Fig 5) The all-p con- 
structor introduces the class of mdlwduals all Of 
which have a certain property (whose value can 
vary from individual to individual) For example, 
(all-p prsce \[$200,$5000\]) denotes the class of indi- 
viduals that have a property called 'price' w~th a 
value ranging between $200 and $5000 An individ- 
ual can only have one value for each of Its proper- 
tins (cf Fig 2) The alLr constructor introduces 
a class of individuals that all partlctpate m,~er- 
tam kind of relatlonsh\]p to individuals from One of 
the concept classes given m the constructor For 
example, (all-r equzpped-wzth OperatmgSystem 
ApphcatsonSoftware) denotes the class of individ- 
uals that are m a relationship called 'eqmpped-wlth' 
only to individuals of the class 'OperatmgSystem' 
or the class 'ApphcatlonSoftware' The dmtmctlon 
between the constructs all-p and all-r m uncommon 
m the domain of terminological logics (Woods 8z 
Schmolze 92), because primitive types hke stnng and 
integer are usually considered to be concept classes 
as well As we wdl see m Section 3, the termino- 
logical reasomng underlying the text condensation 
process explmts thin dmtmctlon between properties 
and relatmnshlps 
The exist-v constructor introduces the class of in- 
dividuals that all have a certain property value For 
example, (exlst-v wezght 6 51bs ) denotes the class 
of individuals that have a property called 'weight' 
with the value '6 51bs ' The exist-c constructor de- 
fines the class of individuals t\]~at have a conceptual 
relatloushlp to at least one individual of a specific 
concept class For example, (exlst-c has-part Cpu) 
denotes the class of mdlvlduals that are ma relation- 
ship called 'has-part' to at least one individual of the 
class 'Cpu' With the and constructor several class 
descriptions can be combined into one (cf Fig 5) 
The model-theoretic semantles of the terminological 
languagewe use m depicted in Fig 2 
2.2 Representing Text Knowledge 
TOPIC's text parser heavily rehes on terminolog- 
ical knowledge about the domain the texts deal 
wlth (Hahn 89). In the course of text analysm, the 
parser extends thin dommn knowledge incrementally 
by new concept definltlons In order to dlstmgumh. 
between prior dommn knowledge and newly acqmred 
text knowledge we extend our basic terminological 
language wlth the constructs specified m Fig 3 The 
operator _~T mdlcates a pnmltlve concept originate 
mg from the text analysm Only a Im~ited number 
of constructs can be used for such a concept defim- 
tlon - they correspond to the kinds of knowledge the 
parser can extract from a text (see Fig 5) 
• A new concept can only be acquired when the 
text makes a reference to a superordmate con- 
cept already known m the domain knowledge 
Thus, the concept expression on the right-hand 
side of the _(T construct must comprme a ref- 
erence to a superordmate concept, as expressed 
98 
I 
I 
I 
I 
I 
I 
I 
• ~\[c\] c_ dce~p~\] , 
e\[all-p prop rl rn)\] ---- 
e\[all-r rel Cl c.)\] = 
e\[(exist-vprop v)\] = 
e\[(exist-c rel c)\] 
Lf c_< cezpr 
{x E D I I1{0 e D I (~, y) e e\[prop\]}l\[ ---- 1 ^ 
Vy ((Z, y) e e\[prop\] =~ y e (e\[rl\] U U e\[rn\]))} 
{x e D 1 3y (==,y).e e\[rel\] ^ vy 
((~, y) e ,\[rd\] ~ v e (,\[c,\] u u dc.\]))} I: eD~ D I (z, v) e ~\[prop\]} 
I 
Figure 2 Model-Theoretic Semantics of the Constructs from Figure i 
Figure 3 
( tcono-|ntro) 
(tc-ezpr) --= (conc-name) ~_T (and (conc-name) (tc-expr) +) (exist-v.~vrop.name) (value) (flag)) \[ 
(exSst-c (~el-na,~) (~on~-na~e) (flag)) l 
(ccount (awe,ght) ) ! 
pcount ~rop-name) (mve,ght)) \[ 
rcount (rel-name) (conc-name) (awesght) ) 
Add~tlonal Termmologlcal Constructs for Representing Text Knowledge 
by the syntax 
• Properties of a new concept can be learned 
(exlst-v construct) 
• Relationships to other concepts can be learned 
(exlst-c construct) m case the relatlonshlp 
range m already defined by a corresponding 
all-r construct 
The text-knowledge-specflic versions of the 
exist-v and exist-c constructs have an additional 
argument whlch serves as a flag that is set when- 
ever one of these constructs is added to a concept 
descnptlon 0 e, when the assoclated property or re- 
latlonshlp has been learned) The text condensatmn 
component of TOPIC makes use of tlns flag m or-. 
der to determine those facts whlch have been learned 
since a certain reference point (where all flags were 
set to 0) 
Besides acqmrmg new domain knowledge from 
a text, the parser performs book-keeping activities 
In order to record how often a concept, a prop- 
erty of a concept, or a relatmnslnp to another con- 
cept m explicitly or tmphcltly mentioned In the 
text For this purpose, we provide the constructs 
ccount, pcount, and rcount for concept descrip- 
tions These constructs belong to the text knowledge 
and can be apphed to concept descriptions derived 
from the text as well as to concepts of the dommn 
knowledge The ccount (pcount) construct indi- 
cates how often (a property of) a concept has been 
mentioned, whereas (rcount re/conc awe,ght) in- 
dicates how often the relationship tel to a concept 
conc has been referred to We call the numbers in- 
troduced by the count operators actwatson wesghts 
An .(rcount re/ conc awe,ght) construct can only 
occur as part of a text concept description when it 
also contains a construct (an-r tel cl ca) where 
conc m subsumed by one of the c~s If thin m not the 
case, rcount refers to a concept being related via a 
relationship rel which m not m the range of this reta- . 
tlonslnp - thus, the rcount statement would make 
no sense Since none of the count constructs (and 
the flags) make an assertion about the meaning of 
the concepts revolved, they have no Influence on the 
concepts' extension (cf Fig 4) Fig 5 illustrates 
the apphcatlon of multiple knowledge base opera- 
tlons resulting in the text knowledge representation 
for the newly learned concept 'Notebooster' as a spe- 
clahzatlon of 'Notebook' 
3 Text Knowledge Condensation 
The text condensation process examines the text 
knowledge base generated by the parser to determine 
certmn chstnbutlons of activation weights, patterns 
of property and relatlonslnp assignments to con-. 
cept descriptions, and particular connectwlty pat- 
terns of active concepts m the concept hierarchy 
These constitute the basra for the construction of 
thematic descriptions as the result of text condensa- 
tion Only the. most sigmficant concepts, relation- 
ships and properties (hereafter called sahent) are 
considered as part of a topic description (cf Section 
3 1) Thus, text condensation (or, equally, text sum- 
manzatlon) can be considered an abstrachon process 
on (tezt) knowledge bases 
A topsc descrzpt:on m a combmat|on of salient con- 
cepts, relationships and properties of a formal text 
umt The computation of these concepts m started 
only m certain well-defined Intervals In the sub- 
language domain of expository texts, at least, topic 
99 
Figure 4 
~\[c\] ¢ dcexpd , ~f c_<r c~=pr e\[(~ountQ\] = 
D 
e\[(pcountprop =)\] =, .D 
e\[(rcount rel c =)\] = D 
e\[(exist-v prop u f)\] -- e\[(exist-v prop v)\] 
e\[(exist-c rel c f)\] = e\[(exist-c tel c)\] 
Model-Thcoretlc Semanttes of the Constructs from Figure 3 
Dommn Knowledge (Definition of a Concept Class) 
Notebook < (and (all-r manufactured-by Manufacturer) 
- (exist-c has-part Cpu) (exlst-c has-part RAM1) 
(exlst-.c has-part HardD1skl) 
(all-p we, ght |lib ,151bs \]) (all-p price \[$200, $5000\]) 
- - (all-r eqmpped-wlth OperatmgSystem Apphcat4onSoftware) 
(exist-c eqmpped-wlth MS-DOS)) 
(al|-p =ze \[1MB, 64MB\]) ) 
(all-p raze \[100MB, 1GB D ) 
RAM1 _< (and 
HardDmkl _< (and 
Text Knowledge 
Notebooster <_T (and 
RAMI-1 _~T (and 
Figure 5 
Notebook (ccount 12) 
(exist-c manufactured-by LeadmgEdgeTech I) 
(rcount manufactured-by LeachngEdgeTech I) 
(exist-c has-part 486SL 1) (rcount has-part 486SL 3) 
l exist-c has-part RAM1.1 1) .(recount has-part RAMI-1 2) rcount equlpped-wlth MS-DOS 2) (exist-v weight 6 5\]bs 1) (pcount welght 1)) 
RAM1 (ccount 1) 
(exist-v slze 8MB I) (pcount slze I)) 
Knowledge Representatmn Structures Resulting from Text Parsing 
shifts occur predominantly at paragraph boundaries 
Therefore, text condensation is started at the end of 
every paragraph so that thematic overlaps as well 
as topic breaks between adjacent paragraphs can be 
detected and the extension of a topic be exactly de- 
hmlted The condensatmn process ymlds a set of 
topic descr~pt=ons, each one charactenzmg one or 
more adjacent paragraphs of the text (cf Section 
3 2) Finally, the entire collection of topic descrip- 
tions of a single text can be generahzed m terms of 
a hmrarchlcal tezt graph (cf Section3 3), the repre- 
sentatmn form of a text summary 
3.1 Condensation Operators 
We apply several operators to text knowledge bases 
to detenmne which concepts, properties, and-rela- 
tionships play a dominant role m the corresponding 
texts and thus should become part of their topic de- 
scription All of these operators are grounded m 
the semantics of the underlying terminological logic 
Some of the operators make addltmnal use of cut-off 
values which are heurmtlcally motwated and have 
been evaluated emptrically 
Salient Concepts: 
There are several criteria to determine salient con- 
cepts The most simple, less "knowledgeable" crite- 
rion conmders all those concepts sahent whose acti- 
vation weight exceeds the average actwatlon weight 
of all active concepts 1 A second criterion renders a 
concept sahent, ff the total sum of references made 
to propertms of It and to relationships to other con- 
cepts.m greater than it m, on the average, the case for 
all other active concepts (SC1) exploits the struc- 
ture of the aggregation luerarchy and evaluates it by 
the associated actwation weights (for the defimtmns 
of sets and functions we use below, cf Table 1) 
(SC1) c m a sahent concept tff 
E E c,EAC rp~ERuP 
IIACIt 
Wlnle (SC1) checks the total number of references 
made to any property or relationship, (SC2) m con- 
cerned with the number of dsfferent Propertms and 
relationships mentioned 
• 1Throughout the paper, we call a concept c an active 
one, tf ccount(c) > 0 (cf Table 1) 
100 
I 
I 
I 
I 
I 
ccount(c) = n ~ c <~ (and (ccount n) ) or c <,/, (and 
• f ~ ,'~o..t(c, rp, c'), ,f ,'p e R r~ou.t(c, rp) 
~ ,'~c - 
~, pcount(c, rp), ff rp EE P , 
n, if c< (and . (rcountrelc'n) ). 
rcount(c, rel, c') = n, ff c .~T (and " (rcount tel c' n) ) 
O, else 
n, ff c < (and (pcount prop n) ) 
pcount(c, prop)= n, \]f c --<T (and (pcount prop n) ) 
O, else 
I, ff rpcount(c, rp) > 0 rpachve(c, rp) = O, else 
( ~ex,,tc(c, rp, c'), ff rp ~ R 
1, ~c --<T (and (exist-c rel c' f) ) A f # 0 exzstc(c, tel, c') = { O, 
ex~stv(c, prop, 
~s-a(ez,c~) ¢~ c~ _< e= V cz <~, c~ V c~ < (and c~ )vcz _<T (and 
C = {c I c < cezpr or c _<T cezpr ~s part of the knowledge base} 
AC = {c I c ~ C ^ e~o.nt(c) > O} 
V = the set of all property values occurring m the knowledge base 
P = the set of all properties occurnng m the knowledge base 
R = the set of all relatmnslups occurnng m the knowledge base 
(ccount n) ) 
c= ) 
Table 1 Au~hary Set and Functmn Defimtmns for Sahence Computatmn 
(SC2) c zs a sahent concept df 
rpa~,ve(c, rp,) > 
rpsERuP 
¢~EAC rp~fiRuP 
tlACll 
Th e following two cnterm explozt the inherent spe- 
clalzzatmn structure of concept hzerarchzes (cf also 
(Lm 95) for a slmzlar perspectwe on using semantm 
generalzzatmn relatmns for the computatmn of con- 
cept salmnce) They thus resemble criteria as used 
for the defimtmn of macro rules to achmve sum- 
manes of texts(Correzra 80, D~k 80, Fum et al 
85) These criteria also incorporate some notmn of 
graph connectzvzty that has previously been conszd- 
ered by (Lehnert 81) for text summarLzatmn pur- 
poses (SC3) determines an actwe concept c as be- 
mg salmnt sff a slgmficant amount of subordinates 
of c are actwe, too (SC4)zs szmflar but zt marks all 
non-actzve (t) concepts as being salmnt winch are re- 
lated to a slgmficcant number of actwe subordinates 
Thus, concepts can be included m the topm descnp- 
tmn winch have never been mentioned exphcltly m 
a text (SC4) only ymlds the most spectfic concepts, 
z e,zt excludes concepts for whmh the main criterion 
zs fulfilled, but which are superorchnate to another 
concept that also fulfills the criterion Lastly, (SC4) 
has a more stnngent cut-off criterion Tins m nec- 
essary because zt makes non-actwe concepts sahent, 
accordingly, one has to be careful not to include \]rrel- 
evant concepts Therefore, (SC4) reqmres a quarter 
of all subordinates (at least 3) to be actwe, whzle 
(SC3) has a relatwe cut-off, value winch gives lower 
percentages for greater numbers of subordinates (the 
cut-off values have been determined empmcally) 
(SC3) c is a salzent concept flf 
ceo~nt(c) > o ^ II{e' I~-a(c',c)}nACll > 
"II{V I,~-a(V,c)}ll 
I1{¢ I ,s-a(¢, e)} n ACll 
(SC4) c lS a salient concept flf 
lit v I ,~a(d, c)} n ACII >_ 3 and 
ccount(c) = OAc E candA ~3c ~ E cand zs-a(d',c) 
where 
ca.d = {c I Il{V I ss-a(¢, c)} n ACll _> 
0 25 II{V I,~-a(V,c)}ll } 
101 
Salient Relationships and Salient Properties: 
Just as certain concepts may have been dealt with. 
more extensively in a text than other ones, tangle 
features of a concept definition may have been more 
focused on than other features of the same concept 
The following criterion renders a relationship (or 
property) rp sahent tf the number of concepts (or 
property values) to which e has been related via rp 
is greater than it m, on the average, the case for rela- 
tionships (or properties) In c Note that c must be a 
concept learned dunng text parsing, as learning new 
features m only possible for such concepts (SR1) is 
evaluated for salient concepts only because we are 
not interested in sahent features of concepts being 
irrelevant for a topic description .... 
(SR1) A relationship or property rp of a salient 
concept c is considered salient in the context of c lff 
rpaetzve(c, rp,) > 3 and It holds that 
rp,6RtJP 
E ez,acount(c, rp ) 
rp~ 6RuP ez~stcount(c, rp) > 
rpactzve( e, rp~ ) 
rpj 6RUP 
Related Salient Concepts: 
A concept d m considered a related sahent concept 
for the salient concept c if there m a relationship tel 
from c to d where the sum of the activation weights 
of all relationships of type tel from c to d or to sub- 
ordinates of d m greater than the average activation 
weight of all active relationships for c If d is deter- 
mined as a related salient concept for c, then the as- 
sociated relationship tel becomes a salient relation° 
ship of e Thin criterion combines knowledge about 
conceptual aggregation and concept haerarchaes with 
a numerical weights 
(SRC1) A relationship tel between a sahent con- 
cept c and some concept d m considered salient 
and d is considered a related salient concept flf 
rpactsve(c, reid) _> 3 and the following holds ' 
reliER 
rco .t(c, ret, C,) > 
{~, I c,=¢' v ,,-~(~,,e)} rpeount ( c, rel~ ) 
relj GR 
E rvaa,, e(c, ra,) 
relaGR 
In the following, (c) denotes a salient concept c, 
(c r) a salient relationship r of concept c, and (c 
r d) denotes a related sahent concept d for concept 
c with respect to the relationship r 
3.2 Paragraph-Level Topic Descriptions 
The condensation operators just introduced are ap- 
phed at the end of every paragraph to the text 
knowledge base which results from parsing that 
paragraph They yield a set of salmnt concepts, re- 
lationships, properties, and related salient concepts 
In the next step, these raw data are combined to 
form a compound topic description for that para- 
graph The combination m performed according to 
the following rules 
* A salient concept (c) which m already covered 
by a salient relationship or property (c rp) or 
a related salient concept (c r d) is removed 
s A sahent relationship (c r) already covered by 
a related salient concept (e r d) is removed 
After having determined the topic description td of 
the previous paragraph a cheek is made whether this 
paragraph deals with the same topic as the immedi- 
ately preceding paragraph(s), or vice versa If this is 
the case, the topic description td of the current para- 
graph is added to the topic description of the pre- 
cechng paragraph(s), otherwise a new current topic 
• description is created and set to td Formally (cf 
also Table 2) 
Let td be the topic description of the last para- 
graph and td, be the topic description of one or 
more paragraphs immediately preceding td, then 
td, m set to td, Utd If td~ Utd = td~ V tds Utd = td 
otherwme td, is not modified and td,+i m set to td 
For example, the following two topic descriptions of 
adjacent paragraphs would be combined into one 
{(Notebooster has-part 486SL), (Notepad)}, 
{(Notebooster has-part)} 
Analyzing a text this way yields a set of consec- 
utive topic dsscnptlons tdl, ,tdn, each one char- 
actenzmg the topic of one or more adjacent para- 
graphs To every topic description td, we asso- 
mate the corresp0ndmg text passage and the facts 
acqmred from it We call the resulting compound 
structure, m which drfferent meclla combine, a (by- 
per)text conststuent 
3.3 The Text Graph 
From the topic description contained m a text con- 
stituent, more generic constituents can be demved 
m terms of a hierarchy of toplc descnptlons, form- 
ing a text graph The construction of a text graph 
proceeds from the examination of every palr of basic 
topic descriptions and takes thelr conceptual com- 
monalitms to generate more generic thematic char- 
acterlzatlons Exhaustively applying this procedure 
(also taking the newly generated topic abstractions 
lo2 
GeneralB.ed topic descnptJons 
Text constltttents 
(with attached 
text fragments) 
Notepad has-part 
Notel~:l equtPlmd..w~th 
IS'4t 
dent~ty 
Is-a 
-~-'-~ xkmty " 
Notebooster has-part 486SL 
Notebooster has-part RAM1-1 
Notebeostet hae-pan 
mant~acturef 
Figure 6 An Illustrahve Fragment of a Text Graph (redundant Is-A relations are omitted) 
---- V :lr, c' c') E td tdu{(c)} L~u{(~)}, eke 
~'ta,~ 3c' (c r c')Etd tdU{(c r)} .tdU{(c r)}\{(c)},else 
td u { (~ r ~') } = ta u { (c ~ c') } \ ((c), (~ r) } 
tdUtdl = U {e} 
eGtdUtd' 
Table 2 The Operator U for Combining Topic De- 
scnphons (\ stands for the set complement operator) 
into consideratxon) results m a text graph as a hi- 
erarchy of topic descriptions The most specific de- 
scrlphons (they correspond to the text conshtuents) 
form the leaf nodes of the text graph, the general- 
ized topic descriptions conshtute its non-leaf nodes 
Their hierarchical organlzahon ylelcls ~fferent levels 
of granularity of text summanzatmn (see Fig 6) It 
is exactly thin emergent generallzahon property of 
tile text graph that we consider the source of our 
scalabihty arguments Very brief summaries, only 
intended to capture the mmn topics of the text, can 
be generated from the upper level of the text graph 
Continuously deepemng the traversal level of the 
text graph provides access to more and more specific 
reformation Our procedure thus combines the po- 
tential for supplying summaries on the lndtcahve as 
well as informative level of text knowledge abstrac- 
tion (cf (Borko g~ Bermer 75) for the distmchon 
between mdlcahve and informative abstracting) 
4 Related Work 
The task dommn of text summarization is charac- 
terized by a ~clash of cwshzatwns" From the point 
of view of natural language understanding proper 
(Schank & Abelson 77, Dyer 83) it ts considered a 
heavdy knowledge-based task reqmnng a substantial 
knowledge background In the field of mformahon 
retneval, however, the corresponding task of auto- 
mahc abstracting, has been considered from Its very 
beganmng (Luhn 58), a problem that can be dealt 
with by surface-level pattern matching techmques 
and statLshcal methods originally developed for lex- 
lcal selection tasks such as automahc mdeydng or 
classlficahon (Salton et al 94) Thin approach has 
recently been given a lot of attenhon agaan, mmnly 
due to the renamsance of statlshcal methodology m 
the field of parsing and tagging (Kuplec 95) Given 
a stahstlcal approach, however, automahc abstract- 
ing bods down to a sentence extrachon problem, 
vsz deterrrmnmg the most salient sentences based on 
surface-level lexlcal or positional lndicatom 
We adhere to the knowledge-based paradigm of 
abstractmg and propose to fully integrate text 
knowledge abstraction m a terminological reason- 
mg model In such an approach, text understanding 
and summarlzatton are considered within a formally 
homogeneous framework Moreover, and most im- 
portant, this model allows for a staged provmon of 
mformatwn m summaries based on conceptual crite- 
ria (as illustrated by the chscusslon of text graphs) 
Such a funchonallty is unhkely to be achieved by 
surface-oriented approaches due to their inherent 
hmltahons to provide cohesive summaries from large 
sets of extracted sentences (Pmce 90) 
5 Conclusions 
We have • introduced an approach to text summa- 
rlzatlon which m sohdly rooted m the formal seman- 
tics of the underlying terminological representahon 
system In tins approach, text summanzahon is an 
operator-based transformation process on knowledge 
representahon structures that have been derived by 
the text understanding system Currently, the sum- 
manzatlon process considers only activity and con- 
nectlvlty patterns m the text knowledge base In the 
future, we plan to augment these criteria and to ex- 
103 
plmt text coherence patterns for summarization (cf 
(Hahn 90) and related proposals by (Alterman 86)) 
The zmplementahon of the summarization system 
and Its associated text understemder have proved 
functional with expository texts m the domenn of 
Information technology as well as with texts from 
the legal and business domains 

References 
Alterman, R \[1986\] Summmnzahonm the small In 
N E Sharkey (Ed), Advances m Cogmt:ve Sc:- 
ence 1 (pp 72-93) Chlchester Elhs Horwood 

Borko, H, Bernler, C L \[1975\] Abstracting Con- 
cepts and Methods New York etc Academic 
Press 

Correlra, A \[1980\] Computing story trees Ameri- 
can Journal of Computat:onal L:ngutstscs, 6 (3-4), 
135-149 

Culhngford, R E \[1978\] Scrlpt Apphcat:on Corn-. 
puter Understanding of Newspaper Storzes New 
Haven, CT Depaxtment of Computer Science, 
Yale Umverslty (Research Rep 116) 

DeJong, G \[1982\] An overview of the FRUMP sys- 
tem In W Lehnert & M H Rangle (Eds), Strate- 
g:es. for Natural Language Processing (pp 149- 
176) Hdlsdale, NJ L Erlbaum 

D~k, T A van \[1980\] Macrostructurc~ an Inter&s- 
csphnary Study of Global Structures m Dtscourse, 
Interact:on and Cogn:tson Hdlsdale, NJ L Erl- 
baum 

Dyer, M G \[1983\] In-Depth Understanding a Com- 
puter Model of Integrated Processing for Narrative 
Comprehenswn Cambridge, MA MIT Press 

Fum, D, Gmda, G, Tasso, C \[1985\] Evaluating im- 
portance a step towards textsurnmarizahon IJ- 
CAI'85 Proc of the 9th Internatsonal Joint Conf 
on Artzfi~al Intelhgence (Vol 2, pp 840-844) Los 
Angeles, Cal, 18-23 August 1985 Los Altos, CA 
W Kaufmann 

Hahn, U \[1989\] Making-understanders out of 
parsers semantically driven parsing as a key con- 
cept for reahshc text understanding apphcahons 
Internatsonal Journal of Intelhgent Systems, 
(3), 345-393 

Hahn, U \[1990\] Topic parsing accounting for text 
macro structures m full-text analysm lnformatwn 
Processing ~ Management, ~6 (1), 135-170 

Hutchms, J W \[1987\] Summanzahon some prob- 
lems and methods Informahcs g Proc by the 
Ashb Co-ordinate Indexing Group Meaning the 
Fronher of \[nformatscs (pp 151-173) Cambridge, 
U K, 26-27 March 1987 London Ashb 

Kuplec, J , Pedersen, J, Chen, F \[1995\] A tramahle 
document summarizer In SIGIR '95 Proc of 
the 18th Annual Internat:onal ACM SIGIR Conf 
on Research and Development m lnformat:on Re- 
trseval (pp 88-73) Seattle, Wash, USA, July 9-13, 
1995 

Lehnert, W \[1981\] Plot umts and nazrahve sum- 
manzatlon Cogmt:ve Sc:ence, 5, 293-331 

Lm, C -Y \[1995\] Knowledge-based automatic topic 
ldenhficatlon Proc of the 33rd Annual Meeting 
of the Assoc:at:on for Computat:onal L:ngu:stws 
(pp 308-310) Cambridge, Mass, USA, 26-30 June 
1995 

Luhn, H P \[1958\] The automatic creahon of htera- 
ture abstracts IBM Journal of Research and De- 
velopment, ~ (2), 159-165 

Pence, C D \[1990\] Constructing hterature abstracts 
by computer techmques and prospects Informa- 
twn Process:ng ~4 Management, ~6 (1), 171-186. 

Ran, L F \[1987\] Knowledge.orgamzatlon and access 
m a conceptual mformahon system lnformahon 
Processing ~ Management, ~3 (4),. 269-283 

Relmer, U, Hahn, U. \[1988\] Text condensa- 
hon as knowledge base ahstractlon Proc of the 
~th Conf on Artsficml Intelhgence Apphcatwns \[CAIA\] 
(pp 338-344) San Diego, CM, March 14- 
18, 1988 . 

Salton, G, Allan, J, Buckley, C, Smghal, A \[1994\] 
Automatic analysm, theme generahon, and sum- 
maxlzahon of machme-readahle texts Sczence, 
~6~ (3, June), 1421-1426 

Schank, R C, Abelson, R P \[1977\] Scr:pts, Plans, 
Goals and Understanding an Inqu:ry into Human 
Knowledge Structures Hdledale, NJ L Erlbaum 

Tent, J I \[1985\] Generating summaries using a 
script-based language analyser In L Steels & 
J A Campbell (Eds), Progrcss m Artzficml lntel- 
hgence (pp 312-318) Chchester Elhs Horwood 

Woods, W A, Schmolze, J G \[1992\] The KL-ONE 
famdy Computers and Mathemat:cs wsth Apphca- 
t:ons, ~3 (2-5), 133-177 

Young, S R, Hayes, P J \[1985\] Automahc clas- 
sification and summarlzahon of banhng telexes 
Proc of the ~nd Conf on Art=ficml lntelhgence 
Apphcatwns \[CAIA\] (pp 402.-408) Miami Bev.ch, 
FL, December 11-13, 1985 
