ON TEXT COHERENCE PARSING 
Udo llahn 
Albert-Ludwigs 4hfiversit~t Fmiburg 
Linguistische Infonnatik / Computerlinguistik 
Fricdrichstr. 50 
D-W-78(X) Freibtn-g i. Bixg, 
Gcnn;my 
email: hahn@supreme.coling,uni~freiburg.de 
ABSTRACT 
In this paper global patterns of thematic text organization me 
considered within the framework of a distributed model of text 
understmlding, ttased on file parsing resuhs of prior lext cohe- 
sioll analysis, specialized text grammar modules deternfine 
whether SOllle well-defined text lllltCro organization palteln is 
cOlll\])ulabte flOlll the available text representatiolt SIrUCttlreS. 
The model underlying text coherence parsing formalizes hither 
to entirely intuitive textlinguistic notions whose origin can be 
traced back to Danes's work on thematic progression patleHlS. 
1 INTRODUCTION 
l)ufing tim last years it has become increasingly appar- 
ent that dialog and text understanding systems must 
account Ior connectivity relations that extend over sen- 
tence boundaries. This has led tn a bulk of work deal- 
ing with varinus forms of cohesimt-preserving lan- 
guage mechanisms, lnainly in the field of anaphora, 
which contribute to connectivity among sentences. 
From the focus on these linguistic phenomena one 
might obtain a misleading picture of textual connectiv- 
ity, viz. one that considers it basically as a 'fiat', con- 
tinuous streanl of formally connected utterances lacking 
additional structure. Far less research has been devnted 
to the intemM organization of cohesive utteranccs by 
mechanisms at a more global level of dialog/text archi- 
tecture, the level of text coherence. 
Major computational approaches rclated to co-. 
herence aspects within a dialog processing framework 
am due to Reichman's \[1978\], McKcown's \[19851 and 
Scha & Polmlyi's \[19881 lbnnalizations of diatog gram- 
mars. Coherence criteria of written texts llavc \[)cell ill- 
vestigated ill tile context of 'Rhetorical Stmcttlm Thet) 
ry' IMann & Thompson 19881 and related extensions 
le.g., Altemlan 1982, Tucker, Nirenburg & Raskin 
19861 of tile original theory of coherence relations ill 
discom.'se \[Hobbs 1982\]. A second major methodok)gy 
which deals with the global stnactufing of written texts 
is the model nf text lnacro propositions and superstnlc- 
turns \[Kintsch & wm Dijk 1978, van Dijk 19801, tilt 
latter sharing all relevant pmtxmies one generally altriD- 
utes to story grammars \[Runlelhari 19'\]51. The problem 
with this kind of methodology is that, unlike the coher- 
ence relation approach, the grammat,'s which have been 
proposed so far are litirly idiosyncnttic 10r each applica- 
tion dmnain (narratives, weather reports, etc.). Corn- 
molt to all these approaches is the requirement of a 
deep, propositionally guided understanding of the un- 
derlying discourse; in particular, a complete theory o1' 
its dontain and an exhaustive specifieatitm of a natural 
language grammar must be supplied in order to guaraw 
tee proper operation of implemented systelns. This 
AcqEs In! COL1NG-92, NANI\[.:S. 23-28 aot~q' 1992 2 5; 
might explain wily, with only low exceptions, these 
UItK|C1S Of text coherence have resisted lullher coral)ilia 
li0nal treatment as evidenced by Ol)cratiooal systems. 
We here make an alternative alld conlpt|tationa\[ly 
more tractable l)rolx)sal on how it) deal wifll global text 
structures at the text coherei1ce level. Its roots Call be 
traced back to the seminal wolk of F, l)mms \[ 1974\], ill 
which he inl~lrmally deve, lnped tile notion of thematie 
progression patterm', distinguishinl; Delween three pro- 
totypical patterns, viz. constant theme, continuous tim 
realization of dxemes, and derived theme (see st:ctiou 
3). The model outlined ill this paper stalls lmm a thor 
ough fi~rmalization of (one ol) these notioos and places 
it into the cnvimmnent of a fully operational wxt pars 
ing ,~\vste.t wtlose design is mainly oriented towards the 
proper l~cognitiotl of text cohesion aod coherence phe 
nolnclla. Pellioent feasolls for our clloiec oI: a 1)allen 
type model of text coherence ale: 
(1) The text parser fomls part of tile text nndelstand- 
ing system TOPIC. It operates ht a i~al.world doruain 
\[Reimer & \]tahu 19881, i.e. textual input is taken fl'onl 
a t)crnlallcnt stream of test reports in major (;ennan in 
fomladon technology magazines. As it seems that it will 
remain iu/'easible for a long time to come to provide 
exhaustive dt)tuain and grammar si~cilications lor rou- 
tinely operating text understmulers, a palticularly robust 
partialpwwing approach capable o\['handlinp potential 
specilication gaps has lreen adopted. These conditions 
obvkmsly preclude tile consideration of RST-style co 
hercnce relation COlUl)uting as a text coherence analysis 
stlategy, since relevant knowledge pmtions might be 
lacking lbr deteonming specific instances of coherence 
relations. Conversely, the coherence lelatiou appnlach 
seems currelflly itffeasiMe for tile rotlthle processing ol 
large-scale text collections in real donlaios. 
(2) Tile description of ctlherence structures in tenns o f 
coherence mlatiotls or text macro prolx)silitms requires 
the awlilability o\[deet) m'seHiot~d knowledge from thcP 
application domain (A-bt)x level spccilications in Kryp- 
Ion ternlinology; t:f. grachmau el al. \[ 1985 It. The TOP 
l(~ systellI, \[ltlwcver, emphasizes tile role o1: tetmDlo 
logiczd knowledge of its ¢\[om~.iill, i.e. tile description ol 
prolotytiical plx)i)elties aIKl iuferellce rules related to 
baste conceptual HIIi\[S of the domain (Krypton's T-/xlx 
level knowledge). As TOPIC is rather weak with re 
spool to lull-blown asseltiolm\[ knowledge, coherelicc 
relation etm~puting, however valuable it might be, is 
currently out of reach for this systenl. Fortunately, 
Daues-type coherence t)allerns primarily t'eli:r to the 
level of tenninological knowledge. 
(3) l'rototypical patterns oI: themalic llrogression arc 
lairly gem:ral and independent Of particular domains 
ttlat exlx)sitory lexts deal with. l.ingttistic studies have 
PROC. OF COI,ING 92, NANTES, Aut;. 23-28, 1992 
collected empirical evidence for this claim through in- 
vestigations of texts from diverse domains \[Giora 
1983a, Kurzon 1984\]. This coincides with the general- 
ity of use of most coherence relations, but is in sharp 
contrast to the highly constrained and domain-depend- 
ent model of superstructures and story grammars. 
(4) Major thematic progression pattems are correlated 
with particular search styles and retrieval modes in full- 
text information systems. Hence, providing typed co- 
herence operators inherently supports graphics-based 
user interactions with the TOPIC system in terms of ad- 
vanced conceptual orientation and navigation tools for 
semantically guided text graph tours (see section 5.3). 
(5) The investigation of thematic progression pattems 
is of value in its own methodological right. They con- 
stitute a basic structural model of text macro organiza- 
tion as opposed to model-theoretic and plan/goal-based 
approaches (a distinction made by Pustejovsky \[1987\]). 
As such they might complement current text under- 
standing methodologies whose emphasis, so far, has 
been on fairly knowledge-expensive assertional models 
(such as coherence relations and text macro proposi- 
tions) or stereotyped text-semantical models (such as 
superstructures and story grammars). 
2 MOTIVATING THE NEED FOR TEXT 
COHERENCE PARSING 
Tbe model of text structure parsing we propose draws a 
careful distinction between text cohesion and text coher- 
ence phenomena. As to the illustration of text cohesion 
mechatfisms in natural language texts, consider the fol- 
lowing text passage: 
\[1\] The De/taX from ZetaMachineslnc. is a computer 
system that mns Unix V.3. 
\[21 ~h.e_Lw\[9~ is based on a 68020 processor. 
\[3\] It has a 12-inch monochrome display and an integrated 
telephone handset and built-in modem. 
\[4\] Internally, there's a 40-megabyte hard disk, a 1.2- 
megabyte 51/4-inch floppy disk drive, 4.5 megabytes 
of RAM, three RS-232C ports, and an S T-506 port. 
Repeated occurrences of various text cohesion phenom- 
ena are illustrated by nominal anaphora (7"he system' 
in \[2\]), pronominal anaphora ('/t' in \[3\]), both referring 
to the unique antecedent Delta-X (in \[1\]), while '/n- 
ternally, there's a ... hard disk" (in \[4\]) is linked to 
Delta-X via textual ellipsis. The basic cohesion among 
these sentences yields the common thematic back- 
ground for constantly elaborating on a single topic 
(Delta-X). An appropriate text parser should, first of 
all, recognize these multiple cohesion phenomena and 
produce something like the following representation 
structures (indicated by \[...\]R): 
II\]R l)elta-X < manufacturer: { ZetaMachines Inc. } > 
Delta-X < operating system: { Unix V,3 } > 
\[21R Delta-X < CPU: { 68020 } > 
13}R Delta-X < peripheral devices: { 12-inch monochrome display } > 
Delta+X < peripheral devices: { telephune handset \] > 
Delta-X < e~tmunication devices: { modem } > 
14\]R Delta-X < external storage devices: { 40-megabyte hard disk } > 
Delta-X < external storage devices: 
{ 1.2-megabyte 51/4-inch floppy disk drive } > 
Delta-X < main memory: { 4.5 megabytes of RAM } > 
Deha-X < ports: { 3 RS-232C ports } > 
Deha-X < ports: { ST-506 port } > 
Ac'r~!s DE COLING-92, NArcl'l~s, 23-28 AoC-r 1992 26 
What is still lacking is a representation facility which 
characterizes this sequence of single assertions con- 
stantly referring to a single topic (Delta-X) as constitut- 
ing a coherent whole. Recognizing linguistic forms of 
text coherency and providing appropriate thematic 
grouping operators for text knowledge bases is what 
text coherence parsing mainly is about. Even if parsers 
would perfectly recognize and normalize all occurrences 
of text cohesion phenomena in texts, missing recogni- 
tion capabilities for text coherence phenomena would 
nevertheless produce under-structured, incoherent text 
knowledge bases in the sense that global pragmatic in- 
dicatops of discourse bracketing would be lacking. 
3 BASIC TEXT COHERENCE PATTERNS 
In this section, we informally describe the basic pat- 
terns of text coherence focused on in this paper. Ac- 
cording to Danes \[1974\] three categories of thematic 
developments can be distinguished: 
~1 Constant Theme. This pattern is characterized by 
the con.~tant elaboration of one specific topic within 
a text (passage) by considering several of its concep- 
tual facets. The following two paragraphs serve to 
illustrate this major pattern of thematic progression 
(the reference points to the constant theme (Delta- 
X) are indicated by italics): 
\[TI.ll. The Delta-X from ZetaMachineslnc. is a 
multiuser, multitasking computer system that runs 
Unix V.3 and comes complete with most of the soft- 
ware needed for business applications. The combination 
host computer/workstation is based on a 68020 proces- 
sor, with dual 68000 processors providing peripheral 
processing. It has a 12-inch monochrome display andan 
integrated telephone handset and built-in modem. 
Internally, there's a 40-megabyte hard disk, a 1.2- 
megabyte 51/4-inch floppy disk drive, 4.5 megabytes of 
RAM, a network controller, three RS-232C ports, and 
an ST-506 port. 
7\] Continuous Thematization of Rhemes. In 
contrast to constant themes, this pattern realizes a 
continuous shift of topics (visualized by bold ital- 
ics). The process starts with a theme and ,some com- 
ment on that theme which we shall call theme (actu- 
• ally, an elaboration on one of its conceptual facets). 
Now this rheme is focused on as the next theme that 
is elaborated by a corresponding rheme, etc.: 
IT1.2\]. The $12,000 Delta-X host/workstation can 
be supplied from ZetaMachines Inc.. 2999 State St., 
Santa Barbara, CA 93105. Zeta-Machines" sales man- 
ager, Brian Wilson, says that they also plan to market 
the Gamma-Z, a CAD/CAM workstation based on a 
Connection Machine architecture. The underlying 
theoretical foundations are due to D. Hillis, a former 
M.I.T. student who first developed an experimental pro- 
totype based on connectionist principles. 
Derived Theme. Global text structure can also be 
introduced by a variety of topics which share con- 
ceptual commonalities (facets) at the knowledge rep- 
reSelltation level (not necessarily need this be paral- 
leled with properties actually mentioned in the text!) 
without the general concept being explicitly stated in 
the text. Technically this is realized by a set of sub- 
PROC. Cq: COLING-92, NANTES, AUG. 23-28, 1992 
ordinates or instances of a common (only implicit) 
supcrordinate/prototype. Suppose the iUuslrative text 
ITI\] composed of its two constituent parts from 
above, \[T1.1 \] and \[T1.2\], is augmented by ~vel~d 
paragraphs dealing with Gamma-Z and Sigma-P 
machines on a similar level of detail as those pas- 
sageswlfichcor~sidertheDelta-X in \[TII: 
\[T21. The DeltaoX from ZetaMachines... \[1'1. I~TI.2\] 
The Gamma-Z is a MS-DOS machine. Peripheral 
devices include an 8- inch color display, a tmarix printer , 
and a key&)ar d .... 
The Sigma-P system makes available a lot of 
desirable application sz~ftware such as a ck~tatnt~e,~stem, 
word processing, and a variety of games .... 
This text implicitly has workstation as a derived 
lhemc, since that is the immediate prototype concept 
of those three instauees (Delta-X, Gamma-Z, 
S igma-P) explicitly menlioned in \[T 2\]. 
4 TIlE KNOWLEDGE SOURCES 
INVOINED IN TEXT PARSING 
This section deals with the .knowledge sources involved 
in actually parsing a text. Basically (see Figure 1), these 
are constituted by the PARSE BULLETIN, a black- 
board-type memory which records the single events of 
the parsing process, the DOMAIN KNOWLEDGE 
BASE, which contains file domain-specific background 
knowledge needed for the parse, and various EXPER~Ps 
for actually driving the parse through the text grammar 
specifications they incorporate (cf. tlahn \[1990\] for a 
more comprehensive presentation). 
The PARSE BULLETIN has a flat list struc. 
ture. It records the sequence of text tokens as they ap- 
pear in the text and, if relevant (see below), notes their 
class identifiers (FRAME item, ADJective, etc.). More 
imlxmant, cox~structivc parsing activities based on oper- 
ations of the knowledge base and the parser are indi- 
cated at ~ver',d positions (so-called parse points) in the 
PARSE BULLETIN. The type of operation being per- 
formed is indicated by a particular parse descriptor. 
Some are internal to the management of the knowledge 
base, e.g., DEFACF (default concept activation), while 
others indicate grammatical relations recognized by tile 
parser, such as NounA'Vl' (conccptu~d attribution rela- 
tions between nouns), AdjA'FI' (conceptual attribution 
relations between adjectives and nouns). The items alZ 
lcctcd by an operation lorm a so-called parse mple. 
The parser does not consider every token it re- 
ceives from the input text at the same level of detail. 
Instead, it distinguishes between words which am sig- 
nilicant to its performance (conceptually relevant ones, 
such as nouns or arljcctives which denote concepts in 
the domain knowledge base, or linguistically relevant 
ones, such as negation particles, certain conjtmctions, 
quantiliers, etc.), and tho~ that are not (anrong them a 
wide variety of semantically indifferent nouns, verbs, 
particles, etc., each of which is assigned the class iden- 
tifier NIL). The latter are simply discarded from further 
analysis, while the fom~er arc assigned lexicalized 
grammar spccificafiorts. The parser h~s thus been tuned 
towards partialparsing in a spirit similar to that advo- 
cated by Schank ct al. \[19801 and achieves text under- 
standing primarily on a terminological levcl of knowl- 
cdge representation. 
ACfE,~ I~I~C()I_,IN(; 92, NANi'I!S, 23-28 A(l~r 1992 2 7 
pARS1\] BUI,I,I~TI~ 
{0o0\] 0 FOP 
\[0Ol\] "11~ Nil. 
\[002\[ I~lt~-X FRAMF. 
\[002-1\] l'J~ll|.X III D~iFACI' 
1003\] from NIL 
\[0041 7z~llch~ \[zrt~ \['3lAME 
\[004.11 7~ut Miw~mol \]hte. Ill D 'EI:ACI' 
\[004.2\] Delta-X t21 < nt~uauf~:iut~ I11'. ( Ze~M~cllkl~ ln~ I11 } ~. NeanA'rl" 
\[010.3\] I)~.lta-X ~41 < ulis¢ mode \[11: { mullitu~ } ~. ~IjNIT 
\[010.4\] I~lta .X \[5~ < ~ting mod~ I1 I: ( nmldt~kiag ) > AdjAq\[q' ...... 
I 
\[013.2) I~lot-X ~6~ < ~tinlg system\[ I11: { Unix V,3 I11 } > lq~nA3T ! 
1033.3\] Delta-Xlgl<ln~.~rllfl: \[ 68020111) > NouttA'Fl' 
\[037.31 ~0a-X I101 < i~ct 12J: { 68020 HI. 6800~.1 Ill, 611000.2-Ill \] > NoudA'l'l' 
\[039.2} 68000-1 121 < functimx\[ll: { pmiphe~al proofing } > NounA'l'l" 
\[039.3\] 68000~2121 < function II1: { peripheral t~ooenliug } • N~lt&q'l 
\[046.3\] dh~lay-I t2J < pt'~t~ti~ roode I11: { n~aochro~a~ } > AdjA'lq' 
I046.4\] I~\[ta-X I111 < i/o tl~c~ I11: { di~day-I Ill \] • NounA'Fr 
\[046,5\] Dclla-Xl|ll<pc~ii41~a'dd~wic~llk \[di~play-Illl}> NounA'IT 
\[050.3\] l~ll~-X I 121 < p,eriph~l deviot~ 12i: \[ dilpllty- I I11, telel~ri¢ I11 } :. NotulA'l'f 
i0"53.2/ D~lt~ XII31 ....... icali~nd~vi~lTJ: \[ Uelephunelll, n~d~tllt } > "N~nATF 
\[053.3\] IXelt~t-X I131 < pe~iph~l d~vic~ 131: { display- 1111,.... n~lmn Ill } > NounA'l'l' 
\[054\] . PUNC-r 
\[055\[ 0 EOP 
DOMAIN KNOW1.BDGB BASB 
('- D~ii~,-~ \[131 1 < Elf; I-Wo~tlli.ii > 
< CPU I1 I: { 68020 Ill } \[n-I~occ~uu~" I > 
Figure 1 A Snapshot of the Parser (also Pre-Conditions 
llolding with rea?~ect to a C'otL~tant Theme Pattern) 
The DOMAIN KNOWLEDGE BASE (KB 
for shoo) contains frame representation structures. 
E~:hframe identifier (in bold face) is assigned a list of 
slots (enclo~d by angular brackets). Them sioLs are as- 
sociated with two different kinds of slot fillers. Permit 
ted slot fillers are enclomd in square brackets, \[a-framo 
namo\], which characterizes the range of possible slot 
fillers by ,all those fr~mles which ale a sulx)rdinate or an 
instance of framo name. Actual slot fillers are enclosed 
in curly braces and can be taken as facts either known 
a prk)ri to ll~c system or acquired continuously from the 
text as its understanding proceeds during file parm. 
In addition, each concept has attached to it an a.'~ 
tivation weight counter. The values of the weight fac~ 
\[ors are enclosed by vertical bars attached to each item; 
if no bars explicitly occur, a zero weight is assumed. 
Activation weights arc incremented (starting from zero- 
level activation) whenever a noun denoting its associ- 
ated concept occurs in the text, and whenever structure- 
building operations in KB aflect that concept. The ma~ 
I'ROC. OF COI,ING-92, NAN'IES, Atl(;. 23.28, 1992 
nipulation of activation weights serves several pur~ 
poses, the major ()tie being their use as an indicator of 
salience of concepts during rite text condensation phase, 
(luring which text summaries are generated flom the 
text representation structures resulting from lhe text 
parse \[Reimer & tlahu 19881. 
The text grammar is composed of a set of distrib- 
uted graulmar experts, cach one responsible for sortie 
specific linguistic function (e.g., concept attribution via 
nominal, adjectival or prepositional phrases, mlaphora). 
Each expert ix characterized by a unique EXPERT 
NAME trod ix activated by a message event, i.e., by 
receiving a message text which nifty contain some pa- 
rameters. 111 order to check its conlt~tence in contrib- 
uting to the parse, pre-ennditions com\[xrsed of com- 
plex test predicates are evaluated. If these pre-condi- 
tions hold for that expert, the post-conditions imme- 
diately apply, i.e. messages are sent to qualified actors 
(to other grammar experts, to the domain KB or to the 
bulletin). 
5 A DISTRIBUTEI) MODF, I~ OF TEXT 
COtlERENCE PARSING 
fil this paper, we shall not go intn the details of phrasal, 
clausal, and text cohesion parsing (of. llahn \[ 1989\] lot 
fin in-flcpth coilsideration of related technical issues). 
hlstead, we assume that these preliminary activities 
have aheafly teen carried out properly arid lhat sonic 
initial strnctural representation is already available from 
tile bulletin. These requirements are fulfilled in the 
snapshot of the PARSE BULLETIN in Figure 1, taken 
after all local parsing events have terminated; dlis char- 
acterizes a state ready to tune to the activation o\[ global 
text stnlclure computing experts. 
We here consider the end of the paragraph (de- 
noted by the symbol 0 and the class identilier EOP) as 
an lulchoring point for coherence computation. It is mo- 
tivated hy the observation that -- at least in tile sublan- 
guage domain we are currently working in -- major 
tnpic movements occur predominantly fit paragrat)h 
boundaries. This coincides with linguistic evidence for 
the (text)grammatical status o1: paragraphs \[tlinds 1979, 
Giora 1983b, and Zadrozny &Jcnsen 1991\]. There- 
lore, the proper rccogalition of textual macro structures 
is always initialized at the end ofa paragnq)h. 
5.1 Considering Constant Theme 
Constant themc is a coherencc pattern which is charac- 
terized by multiple occurrences of a singlcJJ'ame in tt~ 
PARSE BULLETIN within one paragraph. Most of its 
occurrences, in turn, arc accompanied by a slot and/or 
slot fillet" indicating that some knowledge base opera- 
tion with respect to.9~ame has ficcn carried out in KB 
(e.g., slot filling as indicated by NounA'lT or AdjA'IT 
for which wc shall introduce the LC* descriptor as a 
convenient shorthand notation). It is the cnntilmous 
elaboration of that particular conccpt that makes the 
corresponding text passage coherent. While tbe bulletin 
maintains file sequential order of these (,pclations, KB 
provides the conceptual background lot coulinuous ref- 
erences to Ihe same frame object. 
Vigure 2 visualizes the description for constant 
theme; the DOMAIN KNOWI,EIXiE BASE window 
displays fill properties of frame dealt with in a text 
(passage) in the shadowed area of the frame Ix)x, while 
those ilot mentioned in tile text are in tile remaining 
white pat~t. Consequently, it is neither neccssaly Ihat all 
Acllis tIE COLING 92, NANTES, 23 28 AO(JT 1992 2 8 
slots of a frame awulablc in the knowlcdgc basc be 
referred to in the text (as with sloth41 ...... ~'lotm), nor 
that there t)e any ordering constraint relating single slots 
of a fl'amc in KB to thc sequence of slot filling opera- 
tions in the PARSE BUI,LETIN. 
pARSIq BULLI~TIN 
\[.1 0 
\[.l froMt 
\['.'\] frame <slcll:{ slotfilte q l> 
i,',\] fra .... hit 7 : (slot fillet 7 }> 
\[..\] fea~e < slot, : ( slot fili~ } > 
I.I frame < slc4n i : \[ slot fillers I } > 
\[ .\] frame <slot n: { slot fillet } > 
\[..1 0 
DOMAIN KNOWLEDGE BASE 
E < self: a-f~me > < ii > <,i > • ~ ~i~ii,; L : . 
< sl,t,,i :{.. } \[,!> 
<..> 
< slot~ : {.,. ) (,..I > 
Figure 2 The General Corr~tant "lTleme Configuration Pattern 
The general pattern from Figure 2 is already pres- 
ent in Figure 1. This contains a description of the par's- 
ing results of the first paragraph of text \[TI.1\]. The 
entries in the PARSE BULLETIN have been worked 
out by experts for linguistic phenomena on tile local 
level of phrasal, sentence and text cohesion analysis. 
For the propose of constant theme computation, we 
need only consider those entries whose pat.se descriptor 
designates manipulations of slots or slot values of some 
frame (LC*-typc descriptors, such as NounATT or Adj- 
A'FF). Other descriptors are irrelevant here and have 
been left out on purlx)se in Figure 1. From this we con- 
struct the set THEMES. It consists of triples ( J?ame, 
slot, bullpos ) where frame is file name of a frame, and 
slot is the name of a slot of that frame, both co-occur- 
ring as lexical parameters of some parse tuple in the 
PARSE BULLETIN with a LC*-typc pal~e descriptor; 
bullpos gives file parse point in file PARSE BULLETIN 
where frwne mid slot occur iustzmtancously. With re- 
spect to Figure 1 TIIEMES is given by: 
THEMES - { ( Delta X, manufacturer, (XM), 
(Delta-X, usage mode, 010 ), 
(Delta-X, operating mode, 010 ), 
(Detla-X, operating system, 013 ), 
(Delta-X, application domain, 024 ), 
( Delta-X, CPU, 033), 
(Delta-X, processors, 033 ), 
(Delta-X. proccssors, 037 ), 
( 68000 1, function, 039 ), 
(68(X~) 2, function, 039 ). 
( display 1, size, 046 ), 
( display-l, presentation mode, 046 ), 
(Delta X, i/o devices, 046), 
PRoc, ol: COI,IN(; 92, NANrI!s, AU(}. 23-28, 1992 
I!OP 
FRAME 
I.(?* 
d,, 
i/. f 
EOP 
(Delta-X, peripheral devices, 046 ), 
( Delta-X, cormnunieatitm devices, 05(1 ), 
( Delta-X, peripherM devices, 050 ), 
(Delta-X, communication devices. 053 ), 
( Delta-X, peripheral devices, 053 ) } 
When considering TIIF, MES, we want tile criteri- 
on for constant {heine to tm spcci\[ied in a way thai ac- 
COUIKS 10t tile fact that up to parse ix}int '037' each slt.}l 
(value) manipulation reR:rs to one particular 1heine 
(Delta-X). Between parse lx}int '039' and '046' there is 
a minor themalical distortion in thai there is no proper 
referetlce to that \[hellle, although slots are menlioltcd 
which are associated with other concepts, llowever, 
from parse lXfint '046' onward the already established 
theme is taken tip again till the end of tile para.graph. In 
conclusi{}n, Delta-X seelns to be a 1}mt~r ean{tidate for 
consideration as a constatlt theme o\[ Ihat l}aragl'aph. 1 
Figure 1 provides a snapshot of the pro-conditions 
that are encountered by tile CT EXPERT, the coher 
ence expert for ConstantTheme. Runnin 8 twice, sup 
plied with diflcrent parametm,-;, it wolks out lhc results 
alluded to alxwc. The grammatical knowledge needed 
for tile determination of it constant theme is incorpo- 
rated in its pre-collditi{m part. This expression is evalu- 
ated q~l,/E iff conslanl-lhetne produces sotnc theme 
and at associated mm-cmpty set RtlEMES related to 
theme, otherwise it is FAI~SI,;. Thc conditions for a con- 
stant theme can now Ira. stated morn precisely: 
constant.theme( textptw~ tes~pos ) 
= ( theme, IffI~X~IES, newpos ) if\[" 
(a) testpos < textpos & 
(b) ( textpos, O, EOP) is in the PARSE BUIA,ETIN ~ & 
(el (prepos, O, COP ) is also in the PMLSE BULLI'; 
TIN such that prepos < textpos and such that no 
other triple with '¢' as text item interwmes be- 
tween prepos and textpos in the l'Al~qE BUIAA:,~ 
TIN & 
(d) newpos • Imax( prepos, testpos )+1, textpos- I \[ & 
(el theme is a frame in the DOMAIN KNOWL- 
EI)GE BASE & 
(f) V ki c \[max( prepos, testpos ) ~1, ~tewpos- 1\]: 
(theme, slot, k i) { TIIEME8 
.===> slot c IeHEMES & 
(g) -,~ k" c Imax( prepos, testpos)+l, newpos-1\]: 
({z) air_theme (distinct from theme) is a fl'ame 
in the DOMAIN KNOWLEDGE BASE & 
(\[~) (alt. theme, slot', k" ) < TI\[EMES & 
(,%) ,H tsk" (. TtIEMES: 
tsk" = ( theme, slot, k" ) & 
(h) IRHEMESI > 2 & 
(i) newpos is maximal in the sense that 
-,-I Apos ~ \[:max'( prepos, testpos)+l, textpos-l\[: 
Apos > newpos & 
conditions (c) - (g) apply, too. 
Otherwise, eonstant.the~w.( textl~VS, testlms ) = * 
1 Cleat~y, thi~ discu~ion should not b~ taken ~uch that the forr, lltl char~ctctizl- 
tion giv~ below ~mly ImMs for file specific sample text irdeHed to lhrilughoul 
this i~\[~r lustead, it sh~ll{I indicate that, alth~lgh the blsic idea of Ul~llt\[ic 
prngres~lo(1 patterns ix overwhelmingly i;inl tie, levi-life texll lcJid \[o I~ less 
homog~us with rcapect to Ih~¢ pattefr~ ~an one lilly COllsider under c\[¢lIi 
Ilkmratory condititms. Thus, fln~nal de.~crlptions have to be inherraltly mbusl 
towards ~uch hx:al foml~ of digmssi~ls 
2 Referenc~ to mltrles th the PARSI! BUI~I.|~I'IN have the fro,nil ( PantePoint, 
par*dL'uple, I)at~eI)e~cnptar ). 
AC1T:S lIE COLING-92, NANI'ES, 23 28 ^O~;t 1992 2 9 
Some {'onimmllS lelated {{} this specilicafion: 
(a) The l}aramclms supplied to ctm.~Hlnl-lheme Spill 
lhe spatial extellsi{lil in PARSE BI.JLI,ETIN which 
IS searched I{}l' it c{)nstiltll l.heltlC; tgxft)os always de- 
notes the end t)f {he cuucnt l}aragraph, i.e. the up- 
per lx}und of |he search area, while testpos delimits 
its h)wer bound. 
(It) The t}alse D}int characterized by textpos iaust colt~ 
tam tile end of-palagraph syitil}(}l 0. 
(c) Since testt~o.~" ll>ay bc any arl)itrary parse t×}int pre- 
ceding textpos, prepo.~ denotes tile pat=sc point in 
PARSF, BtJ1.1 ,t';TIN thai contains 1he end-of-para- 
graph syml×)l occurfin.p; right l~'ft)re tile one {}ll 
palse ix}in{ textpos. 
({1) After lixing the search intelwll in the bulletin for 
which a col\]stanl IhenK: is going to bc coiuputed, 
tle~.vpos allows \[0r vii'it)us choices as to how far a 
constant thenle may acLually extend iu that interval. 
(c) theme nlay be any frame from KP,. 
(I) A ttu~me is related Ill ilS various fitcmes actor{ling 
to Ihe fblh}wilw, condition: ill each btllletin pasitic}n 
(k) where t/let, le t}cctlls in "I'llI,;MI'2~ wilhin lhe in. 
tclval delimited by newl)oS, its associated slot (slid 
glc fimme) is assig\[~cd to lhe set RIIEMES. 
(g) To guarautee lhat the~m~ is the only topic dealt wilh 
ill Ihe text, wBals{} requile that uo ah lh(:t?lt! differ- 
elll \[rl)ilt t\]leDle Occur ill lit{: chosell iiltelval such 
that it. also f{}nns pall t}f TIiI;MES .-- (;0 accotlllls 
ft}r m{}r{: eomplicat{'d cases where both, ah theme 
a\[td themL', i/lay {g,3cllr at tile Salllc p\[nNe poini. 
(hi To role out insignilicant occttlrelIces of theme ilK: 
cardinalily of RIIEMES must exceed a cemlin level. 
(i) The maximality criterion for newpos rules oul 
choosilig t{}{) Slll~l\[I valtleS (if tiewDos. 
l.el us now consider an Bxanlple (}1 the COmllUla.. 
liotl iltocesses illvolved ill actual c{ttl{'.i'ellce \[}arsillg (sec 
l:il,,ure 1). ValiOllS coherellCe eXl~.:llS slafl execllliOll tll}- 
{}1\] consulnplioll of the 0 symlx)l (indicating tile end ol a 
paral;raph) by lhe administration ell}eli of t\[te pai~er, 
hut wc shall limit om attenlR}l| to (2'1" EXI)EIUI ' (since 
the others will eventually staIvc). After receiving 
l:}le<:k CT{ \]'2OP, \[)!15, \[)00 ) as ils st;.Irtilt}~ laessagt.;, 
cottstanl-theme is sutlplicd with inilial paranieters: 
textpos :: {)55, testpos = {X}0. Obviously, pr~7}os = 
0{X}, since the analysis st;ms l{)1 the til~t paragraph of 
the text. newpos clay ll(}W galilee lrOlll '0{)1' 10 '054'. 
l,et us consider Delta-X as theme. (This is a proper 
choice. 11 iml}ropcr choices were ntade, cott,%'ltttH- 
theme w{}nld not t)roduce a significant result.). The 
chaice lor newpos milS\[ aCCOlilitlottatc Ihe tClllp{}raly 
breakdown o{ the selected thet~w beginning from t×}si- 
tioll '{}39', since we have k' ~ {}39 { \[IX)l, 0541 with 
all theme :: 68000-1 (or 68000-2) ill TtlEMES and Bo 
pr0pcr triple ( l}eRa-X, slol 039 ) as required by condi- 
tion g(x.) al:~)ve. So newpos has to be adjusted properly 
to tile parse point '{}39', at which l×}int tile constant 
theme i}attem for l)eha-X eventually temfinates for lit{*, 
lirst time. This produces: 
c~*nnt(o~t-the~ue( 055,000 ) = ( Delta.X, 
\[rrl(Lztuthcttzrer, usage mode, operaling m~le, operatillg ~y~ 
tern, application domain, CPU, processors}, 089 ) 
and ('T EXPEIIT issues a {71'4 roup reading to KB 
incoi'l×}i'~lting lhe constant theme togcther with its ass{}- 
elated i\]tetiies, 
Since lhc PARSE BUIJ ,I ¢TIN hlts not exhat, stivc- 
ly lmen investigated with restmct to its coherence data 
PRO}C. Of, U()I,IN{; 92, NANIE:% AU{;. 2L28, 1992 
(newpos+l < textpos), CT EXPERT resumes execu- 
tion, now starting with a-~econd set of parameters: 
textpos = 055, testpos = 039 (see the second expert 
placed into the foreground in Figure 1). Again, prepos 
-- 000, but due to the new testpos parameter newpos is 
now in the interval \[40, 54\]. The evaluation of con- 
stant-theme( 055, 039 ) starts with a proper choice of 
newpos = 054. testpos+ l excludes 68000-1 (68000-2) 
from further consideration. Finally, we obtain 
cor~t~nt-theme( 055, 039 ) = ( Delta-X, 
{i /o devices, peripheral devices, communication devices},054) 
Note that the occurrence of display-I at parse point 
'046' does not conflict with criterion (g), since we also 
have Delta-X (thematically related to i/o devices "and 
peripheral devices) at that parse point (cf criterion g(z)). 
Since the end of the paragraph has been reached, the 
coherence computation process hails. 
Figure 3 represents the effects of grouping a con- 
stant theme and the themes referred to in the text pas- 
sage (cf. \[055.1\] and \[055.2\]) by the shadowed area of 
the (frame) box. This indicates that the grouped items 
are treated coherently in a text passage. 
PARSB B L~.A~B~ 
\[~1 0 EOp 
(;;;I ; ~o~ 
\[055.11 ~ta-X ( nmnufact~, ~ge In~ opc*ating mtxle, opulling |ylttl~n, 
application domain, CPU, procc~m~ \] (7.1" 
\[055,2} ~-X { bb device.a, peripheral d~-vice~t, ¢.ocmnunicatimt dcvic¢~ } Cr 
< #df: a-work,altion > 
~'~:'~ It I:" ( 680~.0 i| I \[..,1 > 
I~~o ~ ffI:( a~\[,~:tm\] ....... 
I:~~E~~ ~i~i~ ) ~..:I ~ 
i-< ~~ ~~ n~ ~!.fr~~I 
< appliottion ~wlXe: \[tn.applicttion soRw~re\] • 
<~ \[,-~i¢©I > <...> 
Flgur¢ 3 Post-Conditions Holding with respect to a Constant 
Theme Pattern 
5.2 Remarks on Continuous Thematization 
of Rhemes and Derived Theme 
Similarily, formal descriptions have been worked out 
for the other two basic text coherence patterns mention- 
ed above. Instead of a full treatment, we give two rather 
informal sketches of the underlying regularities as they 
have been incorporated into our framework. Contitm- 
ous thematization of rhemes most significantly departs 
from the constant theme schema just outlined (in fact, 
both are mutually exclusive) in that the former incorpo- 
rates a continuous shift of the topics being considered. 
Figure 4 illustrates this permanent change of issues in a 
text. The PARSE BULLETIN contains a sequence of 
local theme-theme pairs withframeTi being tile current 
local theme and slotftllerTi being its associated local 
rheme. Text coherence is due to the fact that the current 
local theme (slotfillerTi) becomes the next local 
theme (framerl+l). This rheme-specific connectivity 
criterion is stressed by the double-sided black arrows in 
the DOMAIN KNOWLEDGE BASE which link the im- 
mediately preceding theme to its identical theme succes- 
Acrl~.s DE COLlNG-92, NAM'ES, 23-28 ho~r 1992 3 0 
sor, while local theme-theme connections are indicated 
by the one-sided grey arrows which go l~om the local 
theme to its associated local tl~eme. A sequence of local 
theme-theme pairs fulfilling the rheme-specilic conuec- 
tivity criterion in terms of overlapping palmneters (cur- 
rent rheme becomes next theme) constitutes what is 
Item called continuous thematization of rhemes, 
i.e. a g/oba/theme-theme cluster. 
\[...1 0 EOe 
i::.\[ f,o-.,, ,.~,,: (.~ o,,,,,. J,-,,. \[, t~. 
\['"\] "f¢'~' rl < Skirl ; \[ #a flll~rr I ~ $~lrl ) > IX:" 
\[...I St,m# rl <sl°lrl: { |1o¢ filk~rrl ~ fr'm'r(l+l)) > LC' 
\[...\] from'fro.l) <skJlr¢..li\[llotfllk~rf..l)=fraWalr.\]> LC j 
\[-.1 /rlm, r, ~llCar.:{ Ilot fllL~x. }> LC" 
\[.,.I 0 EOP 
...... i 
'rr 7 I 
_ ...... I 
F igu re 4 The General Contiouous Thematization of Rhemes 
Configuration Pattern 
An illustration is given by text fragment \[T 1.2 \] in sec- 
tion 3 where bold italics stress the emerging global 
theme-rheme cluster constituted by tile following se- 
quence of overlapping local theme-theme pairs: 
Delta-X - nmntffacturer- ZetaMachines Inc., 
ZetaMachines Inc. - product- Gamma-Z, 
Gamraa-Z - architecture- Conn. Machine architecture, 
Conn. Machine architecture - developer-D. Hillis 
The third pattern further generalizes the results of 
the afore-going coherence computations on the para- 
graph level and extends them over various (adjacent) 
paragraphs and possibly over the whole text. Consider 
a series of paragraphs, each one dealing exclusively 
with one special topic (see Figure 5 below). The first 
paragraph deals with frame T 1, tile second one elabo- 
rates onframeT2, etc. A derived theme can be com- 
puted when all these different (sub)topics call be linked 
to the most specific general (super)topic (frameT). In 
technical terms, these subtopics are all instances of that 
Ptu)c, ov COLING-92, NANTES, AUG. 23-28, 1992 
supertopic.Text \[T2\] illustrates Otis pllenomenon: there 
are three paragraphs whose major topics arc Delta-X, 
Gamma-Z, and Sigma-P; a conceptual generalization 
step links them to the derived theme work, s'tation. In 
Figure 5 this relationship is indicated by thc arrows 
pointing fi'om each subtopic (of a single paragraptt) to 
its supertopic, thematically characterizing these para- 
graphs on a more general level of conceptualization. 
(.ii~ d G~ 
i.'..l /,~'~L.r s <~trs~:~'~rs,~> Ic* 
1...1 l~.r~ <llottll:lld~fllle~rll}> IE* 
1...\] /~.e r~ <llOtr~.: I ~ot fiU~r~ I > I.C ~ 
\[...\] 0 I!OV 
{.,.I f~# r. < tb~ r.t: ( la~ fdter r~tt > IX:* 
\[,,,I .,¢~ ~ r. < Ilot r..: I a°t filler -~1 > IL~ 
\[,.,I o 1~ 
b..I ¢4,0 I~T 
F ig u re 5 "\]'he General Derived Theme Cot95guration Pattern 
5.3 The Merits of Text Coherence Parsing 
Among the many advantages to having text coherence 
pbenomena under computational control we here em- 
phasize their potential for information retrieval dialogs. 
Evidence for this comes from our experiments with 
TOPOGRAPHIC, an interactive graphical interface to 
TOPIC's text knowledge ba~s \[Thiel & Hammwhhner 
1987\]. In particular, we observed a close funclional re- 
lationship between the selection of particular coherence 
patterns and particular search states during the retrieval 
process which is performed on network representations 
of text summaries, so-called text graptLs: 
1) Constant Theme coherently characterizesavari- 
ety of facts related to one particular topic. A CT-based 
search operation enhances the user ~ knowledge of that 
topic by presenting facets (or data related to those fac- 
ets) the user is probably not aware of, although they 
may be relevant to the solution of his or her problem. 
2) Continuous Thematization of Rhemes linlc,;a 
set of formerly unrelated topics by a coherent line of 
conceptual dependencies (cunent rheme becomes next 
theme). A CTR-based search operation therefore pro- 
vides the basis for thettuaical g~sociations and stim- 
AOI"ES DE COLING-92, NANTES, 23-28 AO~I' 1992 3 1 
ulates previously unconsidered lines of reasoning by 
thematically cotL~trained browsing. 
3) Derived Theme g,~oups hierarchically related top- 
ics and thus may enhance the knowledge of alterna- 
tives of the particular topic (,and facts related to it) under 
focused attention of the user (by way of stimulating 
comparisolts, recognizing int0rmation gaps, etc.). 
6 FINAl, REMARKS 
In this paper, a structural model of text coherence com~ 
putation has been proposed that strongly exploits the 
knowledge chunking inherent to fi~ame representations. 
These prccompiled knowledge structures are irtstantiat- 
cd by the topical evolution of a text ax represented in the 
parser's bulletin. Tiros, various coherence phenomemt 
can be distinguished by particular instantiation pattents: 
f3 constant theme is defined by multiple instantia- 
fions of aggregatiou (or conceptual association) re- 
lations for one particular f'r',une item in KB; 
ffl continuous thematization of rhemes is deiino 
ed by multiple instantiations of aggregation rela- 
tions for continuously changing, though locally 
overlapping frame items in KB; 
\[-J derived theme is defined by multiple instantia- 
tions of generalization/classilication relations hold- 
ing between subparts of a frame hierarchy in KB. 
A more elaborated formal description of this model - int:lnding 
those parts which could only be treated rather sketchily in this 
contribution - is given in llalm \[1991\]. The parser is currently 
running on SUN SPARCStations under Unix (SUNOS V4.1 A). 
The functionality described in this paper is fully operational and 
part of the TOPIC text understanding system. 
Paoc. oF COLING-92, NAI'rrES, AUG. 23-28, 1992 

References

Alterman, R. \[1982\] A ryst*m of z.v#n cohtr#nce r*lallons for hitrarchicallJ, 
orRandzing .wrtt concepts in text Uinv. of Tcxis it Aujthl (TR-188}. 

Brlchmmn, R J,; ~E GIIhcrl; ILL Leve|que \[1985\]. An clio••till hybrid reaton- 
iri s $y|t~ - knowl~lgc and iynlJ::~\[ Ifvc~. tcc~mtlt of Krypt~ Proc. lYCAI 8~, 
pp 532-539. 

Danes, I! \[ 19741. Functi~d zcatraLcc pcr~prztive ~nd the orgmnization of the text 
In l! Danes. nd. Papirs on functio~l sgnt*~e perxp~ctlve. Acgd~it, \] 06-128. 

vail DlJk,T A \[19g0\] Macr~tr~t~es llz\[ltdnlc/NJ; I. Eribatml 

G Iota, R. \[ 19/1311\]. Segm~tttinn and segment c~t~i~l: ~1 the thcanlinc ~nganizl- 
tion of the text. T¢~, 3(2): 155-181. 

t;Iora, R I19g3b\[. l:tulctional parJgraph pe~pective In 1, pelt~fi & E. SSze\[, nds 
Micro and m~'ro ChaStity of teals. Ilamtmrg: IL lh~kc, ppA53-182 

Ilahn, If. \[1989\]. Making under~ttndera out of ptr~¢ps, In(er~tio~l Journal o l 
Inl¢lligtnt Syst#ms , 4(3): 345-393. 

tlahn, tl. \[1990\], L4xtkaliach vtrttiltt~ 7k;~t parsing, llerlin: Spnnger 

Uahn, U. \[1991\]. Distribut¢d taxt structurL parsing, I Jnguiltische InfomlltikK~ocn- 
potcrlingulstik, Univ. F~ilmrg. ClJF-Rcport 4/91. 

IIIndl, J. \[1979\]. Oqgarhzational pattcm~ in dinc.o~r~e. In T Givhn, cA. Syntaxand 
stmantlca, Vol./L Ne~ yorkjNY: Academic lh'.. pp. 135-157. 

IIc.bl~, J. R. \[ 1982\]. Toward~ tn undenfftnding of coherence in discottrse, ht W.G 
Ix.hne r & M. Ringle, eds. StrattllitJ for ~l~al lo~tgt~ag¢ p~c~¢si~g l liUsthde/ 
NI; L, Edb*~n. pp.223-243. 

KIntleh, W.; T.A. van DlJk \[1978\[. 'l'owtrd a model of text comprehension and 
p,toduch~l. Paycholo~icalRavi~w, 85(5): 363-394. 

Kurzotl, D. \[1984\] q~lemc~, hyl~rthem~ and the di~oune structure of |tritalh 
legal texts, 71xt, 4(1-3): 31 55 

Mann, W.C.; S.A. Thompson 11988\], Khetoricll stxucture theory: towlrds a 
ftmcli~ltl theory o\[ text organization. T*xt, 8(3): 243-287. 

McKeown. K. \[19851 Dilcou~e stategi~ fen gmlerating natural-language text 
Artifwiol I~a*iligenc¢ , 27(1):1-41, 

PuateJovsky. J. \[1987\]. An integrated th~ry of discourse analyau. In S. Nirca~ 
trarg, nd, M~h~ne t~nslat~n. Cambridge: Cambridge U.R pp,168d91. 

R elchmmn, R. \[ 197g\]. Coaven atiomtl cohenmey. Co~alti~ Science, 2(4): 283-327. 

Relmer, U,; U. Ilaht* \[19~g}, Text c~rad~Jaati~ a~ knowledge base abstraction. 
proc. 4th co~t~ on arti~iol int#lllg*nc# application# (CAIA~8), pp.338-344. 

Rumelhart. D.E. 11975\]. Notcat on • sch~l f~r •tonc~. hi D. Bobmw & A. Col- 
lins, CAs. R*pr~ntatwn and und~rsta~tdia/, New York: Academic E. 211- 236. 

Scha, R.; L. Polanyl \[1988\]. An augmeatcA co*atext f~ g~mmar for dlxcourse. 
proc. COLING88, pp.573-577 

Schank, R.C.; M. l.¢howltz; L IIIrnblum ~19g0\]. An integrated undea~tander. 
Am~rtc~ Joure.al of Computaho~l lda~ui~lics , 6(1 ): 13-30. 

Thlel, U.; R. Ilammw/Jhner \[1987\] Inform~tic~tal ~cotrtth8: ~n intetactt~ modal 
f~ the g~ap~tlc.al •~ to te~t k~wlndge haas. P~e. 10~ ACM SIGIR co~ 
on r~s*arch & d~#lop~iant in i~a~tio~l r~ttri~val, pp.45-56 

Tucker, A. IL; Nlrenburg, S,; Rltlkth, V. \[1999\]. Di~mr~ and coh~ion m ex- 
pository text. eroc COLING "86, p,p.181 183 

Zldrozlry, W.; Jenlen, K. \[ 1991\]. Semantici of pang~ph|, Computational 1in. 
~tth-tics. 17(2): 171-209. 
