TEXTUAL EXPERTISE IN WORD EXPERTS: 
AN APPROACH TO TEXT PARSING BASED ON TOPIC/COMMENT MONITORING * 
Udo Hahn 
Universitaet Konstanz 
Informationswissenschaft 
ProJekt TOPIC 
Postfach 5560 
D-7750 Konstanz i, West Germany 
ABSTRACT 
In this paper prototype versions of two word 
experts for text analysis are dealt with which 
demonstrate that word experts are a feasible tool 
for parsing texts on the level of text cohesion as 
well as text coherence. The analysis is based on 
two major knowledge sources: context information 
is modelled in terms of a frame knowledge base, 
while the co-text keeps record of the linear 
sequencing of text analysis. The result of text 
parsing consists of a text graph reflecting the 
thematic organization of topics in a text. 
i. Word Experts as a Text Parsing Device 
This paper outlines an operational repre- 
sentation of the notion of text cohesion and text 
coherence based on a collection of word experts as 
central procedural components of a distributed 
lexical grammar. 
By text cohesion, we refer to the micro level 
of textuallty as provided, e.g. by reference, 
substitution, ellipsis, conjunction and lexical 
cohesion (cf. HALLIDAY/HASAN 1976), whereas text 
coherence relates to the macro level of textuality 
as induced, e.g. by patterns of semantic recurrence 
of topics (thematic progression) of a text (cf. 
DANES 1974). On a deeper level of propositional 
analysis of texts further types of semantic 
development of a text can be examined, e.g. 
coherence relations, such as contrast, generaliza- 
tion, explanation (cf. HOBBS 1979, HOBBS 1982, 
DIJK 1980a), basic modes of topic development, such 
as expansion, shift, or splitting (cf. GRIMES 
1978), and operations on different levels of tex- 
tual macro-structures (DIJK 1980a) or schematlzed 
superstructures (DIJK 1980b). 
The identification of cohesive parts of a text 
is needed to determine the continuous development 
and increment of information with regard to single 
thematic focl, i.e. topics of the text. As we 
have topic elaborations, shifts, breaks, etc. in 
texts the extension of topics has to be delimited 
exactly and different topics have to be related 
properly. The identification of coherent parts of 
a text serves this purpose, in that the determina- 
tion of the coherence relations mentioned above 
* Work reported in this paper is supported by 
BMFT/GID under grant no. PT 200.08. 
contributes to the delimitation of topics and their 
organization in terms of text grammatical 
well-formedness considerations. Text graphs are 
used as the resulting structure of text parsing and 
serve to represent corresponding relatlons holding 
between different topics. 
The word experts outlined below are part of a 
genuine text-based parsing formalism incorporating 
a llnguistical level in terms of a distributed text 
grammar and a computational level in terms of a 
corresponding text parser (HAHN/REIMER 1983; for an 
account of the original conception of word expert 
parsing, cf. SMALL/RIgGER 1982). This paper is 
intended to provide an empirical assessment of word 
experts for the purpose of text parsing. We thus 
arrive at a predominantly functional description of 
this parsing device neglecting to a large extent 
its procedural aspects. 
The word expert parser is currently being 
implemented as a major system component of TOPIC, a 
knowledge-based text analysis system which is 
intended to provide text summarization (abstract- 
ing) facilities on varlable layers of informational 
speclfity for German language texts (each approx. 
2000-4000 words) dealing with information technol- 
ogy. Word expert construction and modification is 
supported by a word expert editor using a special 
word expert representation language fragments of 
which are introduced in this paper (for a more 
detailed account, cf. HAHN/REIMER 1983, HAHN 
1984). Word experts are executed by interpretation 
of their representation language description. 
TOPIC's word expert system and its editor are 
written in the C programming language and are 
running under UNIX. 
2. Some General Remarks about Word Expert Strut- 
ture and the Knowledge Sources Available for 
Text Parsin~ 
A word expert is a procedural agent incor- 
porating linguistic and world knowledge about a 
particular word. This knowledge is represented 
declaratlvely in terms of a decision net whose 
nodes are constructed of various conditions. Word 
experts communicate among each other as well as 
with other system components in order to elaborate 
a word's meaning (reading). 
The conditions at least are tested for two 
kinds of knowledge sources, the context and the 
co-text of the corresponding word. 
402 
Context is a frame knowledge base which con- 
tains the conceptual world knowledge relevant for 
the texts being processed. Simple conditions to be 
tested in that knowledge base are: 
ACTIVE ( f ) : <---=> 
f is an active frame 
EISA ( f , f" ) : <---> 
frame f is subordinate or instance of 
frame f" 
HAS SLOT ( f , s ) : <===> 
frame f has slot s associated to it 
HAS SVAL ( f , s , v ) : <-==> 
slot s of frame f has been assigned the 
slot value v 
SVAL RANGE ( sir , s , f ) : <ffi==> 
--string sir is a permitted slot value with 
respect to slot s of frame f 
Co-text is a data repository which keeps 
record of the sequential course of the text 
analysis actually going on - this linear type of 
information is completely lost in the context, 
although it is badly needed for various sorts of 
textual cohesion and coherence phenomena. As 
co-text necessarily reflects basic properties of 
the frame representation structures underlying the 
context, some conditions to be tested in the 
co-text also take certain aspects of context 
knowledge into accout: 
BEFORE ( exp , strl , str2 ) : <-=-> 
strl occurs maximally exp many trans- 
actions before sir2 in the co-text 
AFTER ( exp , strl , str2 ) : <---> 
strl occurs maximally exp many trans- 
actions after str2 in the co-text 
IN PHRASE ( strl , str2 ) : <---> 
strl occurs in the same sentence as str2 
EQUAL ( strl , str2 ) : <---> 
strl equals str2 
FACT ( f ) : <==-> 
frame f was affected by an activation op- 
eration in the knowledge base 
SACT ( f , s ) : <-=-> 
slot s of frame f was affected by an ac- 
tivation operation in the knowledge base 
SVAL ( f , s , v ) : <--=> 
slot s of frame f was affected by the as- 
signment of a slot value v in the know- 
ledge base 
SAME TRANSACTION ( f , f" ) : <---> 
--frame f and frame f" are part of the same 
transaction with respect to a single text 
token, i.e. the set of all operations on 
the frame knowledge base which are car- 
ried out due to the readings generated by 
the word experts which have been put into 
operation with respect to this token 
From the above atomic predicates more complex 
conditions can be generated using common logical 
operators (AND, OR, NOT). These expressions under- 
lie an implicit existential quantification, unless 
specified otherwise. 
During the operation of a word expert the 
variables of each condition have to be bound in 
order to work out a truth value. In App.A and App.B 
underlining of variables indicates that they have 
already been bound, i.e. the evaluation of the 
condition in which a variable occurs takes the 
value already assigned, otherwise a value assign- 
ment is made which satisfies the condition being 
tested. 
Items stored in the co-text are in the format 
TOKEN 
TYPE 
ANNOT 
actual form of text word 
normalized form of text word after morpho- 
logical reduction or decomposition proce- 
dures have operated on it 
annotation indicating whether TYPE is iden- 
tified as 
FRAME a frame name 
WEXP a word expert name 
STOP a stop word or 
NUM a numerical string 
NIL an unknown text word 
or TYPE consists of parameters 
frame . slot . sval 
which are affected by a special type of op- 
eration executed in the frame knowledge 
base which is alternatively denoted by 
FACT frame activation 
SACT slot activation 
SVAL slot value assignment 
3. Two Word Experts for Text Parsin$ 
We now turn to an operational representation 
of the notions introduced in sec.1. The discussion 
will be limited to well-known cases of textual 
cohesion and coherence as illustrated by the fol- 
lowing text segment: 
\[1\] In seiner Grundversion ist der Mikrocomputer 
mit einem Z-80 und 48 KByte RAM ausgeruestet 
und laeuft unter CP/M. An Peripherie werden 
Tastatur, Bildschirm und ein Tintenspritz- 
drucker bereitgestellt. Schliesslich verfuegt 
das System ueber ~ Programmiersprachen: Basic 
wird yon SystemSoft geliefert und der Pas- 
cal-Compiler kommt yon PascWare. 
\[The basic version of the micro is supplied 
with a Z-80, 48 kbyte RAM and runs under CP/M. 
Peripheral devices provided include a 
keyboard, a CRT display and an ink Jet 
printer. Finally, the system makes available 2 
programming languages: Basic is supplied b~ 
SystemSoft while PascWare furnished the Pascal 
compiler.\] 
First, in set.3.1 we will examine textual 
cohesion phenomena illustrated by special cases of 
lexical cohesion, namely the tendency of terms to 
share the same lexical environment (collocatlon of 
terms) and the occurrence of "general nouns" refer- 
ring to more specific terms (cf. HALLIDAY/flASAN 
1976). Then, in sec.3.2 our discussion will be 
centered around various modes of thematic progres- 
sion in texts, such as linear thematization of 
rhemes (cf. DANES 1974) which is often used to 
establish text coherence (for a similar approach to 
combine the topic/comment analysis of texts and 
knowledge representation based on the frame model, 
403 
cf. CRITZ 1982; computational analysis of textual 
coherence is also provided by HOBBS 1979, 1982 
applying a logical representation model). 
Word experts capable of handling corresponding 
textual phenomena are given in App.A and App.B. 
However, only simplified versions of word experts 
(prototypes) can be supplied restricting their 
scope to' the recognition of the text structures 
under examination. The representation of the 
textual analysis also lacks completeness skipping a 
lot of intermediary steps concerning the operation 
of other (e.g. phrasal) types of word experts (for 
more details, cf. HAHN 1984). 
3.1 A Word Expert for Text Cohesion 
We now illustrate the operation of the word 
expert designed to handle special cases of text 
cohesion (App.A) as indicated by text segment \[i\]. 
Suppose, the analysis of the text has been 
carried out covering the first 9 text words of \[I\] 
as indicated by the entries in co-text: 
No. TOKEN TYPE A~ ................................................................... 
{~I} In in STOP \[e2} seinet sein STOP 
{~3} Grundversi~ - NIL {04} ist 
ist STOP {~5} der de~ STOP 
{g6} Mikrocomputer Mikroc~ter {07} mit 
mit STOP 
{08} eine~ ein STOP \[e9} 
Z-Be Z-88 
The word expert given in App.A starts running 
whenever a frame name occurs in the text. Starting 
at the occurrence of frame "Mikrocnmputer" indi- 
cated by {06} no reading is worked out. At {09} the 
expert's input variable "frame" is bound to "Z-80" 
as it starts again. A test in the knowledge base 
indicates that "Z-80" is an active frame (by 
default operation). Proceeding backwards from the 
current entry in co-text the evaluation of nodes 
#i0 and #Ii yields TRUE, since pronoun llst con- 
tains an element "ein" a morphological variant of 
which occurs immediately before frame (Z-80) within 
the same sentence. In addition, we set frame" to 
"Mikrocomputer" (micro computer) as it is next 
before frame (with proximity left unconstrained due 
to "any') in correspondence with {06}, and it is an 
active frame, too. The evaluation of node #12, 
finally, produces FALSE, since frame" (Mikrocom- 
purer) is not a subordinate or instance of frame 
(Z-80) - actually, "Z-80" is an instance of "Hik- 
roprozessor" (micro processor). Following the 
FALSE arc of #12 leads to expression #2 which 
evaluates to FALSE, as frame" (Mikrocomputer) is a 
frame which roughly consists of the following set 
of slots (given by indentation) 
Mikrocomputer 
Mikroprozessor 
Peripherie 
Hauptspelcher 
Programmiersprache 
Systemsoftware 
micro computer 
mirco processor 
peripheral devices 
main memory 
programming language 
system software 
Following the FALSE arc of #2, #3 also evaluates to 
FALSE as according to the current state of analysis 
context contains no information indicating that 
frame" (Mikrocomputer) has a slot" to which has 
been assigned any slot value (in addition, "Z-80" 
is not used as a default slot value of any of the 
slots supplied above). Turning now to the evalua- 
tion of #4 slot" has to be identified which must be 
a slot of frame" (Mikrocomputer) and frame (Z-80) 
must be within the value range of permitted slot 
values for slot" of frame'. Trying "Mikroprozes- 
sor" for slot" succeeds, as "Z-80" is an instance 
of "Mikroprozessor" and thus (due to 
model-dependent semantic integrity constraints 
inherent to the underlying frame data model 
{REIMER/HAHN 1983\]) it is a permitted slot value 
with respect to slot" (Mikroprozessor) which in 
turn is a slot of frame" (Mikrocomputer). Thus, 
the interpretation slot" as "~tlkroprozessor" holds. 
The execution of word experts terminates if a 
reading has been generated. Readings are labels of 
leaf nodes of word experts, so followlng the TRUE 
arc of #4 the reading SVAL ASSIGN ( Mikrocomputer , 
Mikroprozessor , Z-80 ) i~ reached. SVAL ASSIGN* 
is a command issued to the frame knowledge base (as 
is done with every reading referring to cohesion 
properties of texts) which leads to the assignment 
of the slot value "Z-80" to the slot "Mikroprozes- 
sor" of the frame "Mikrocomputer", This operation 
also gets recorded in co-text (SVAL). Therefore, 
entry {09} get augmented: 
• ~K~ TYPE ANNOT 
{eg\] z-8~ z-so FRA~ Mikroc~ter.Mikroprozessor.Z-Se SVAL 
The next steps of the analysis are skipped, 
until a second basic type of text cohesion can be 
examined with regard to {34}: 
{II} 48 48 Nt~ 
RAM-I .GrOesse. 48 KByte SVAL - Mik roconlputer. Haupt speicber. RAM- 1 SVAL 
{ 18 } CP/~ CP/~ F~ Mikroc~ter. Bet r i ebssys tern. CP/M SVAL 
{19} . . w~xp {21} Fer ipherle Periphe~ie 
Miktocomputer. Pet i pherie SACT {23} Tastatur Tastatu~ FRA~ 
- Miktoc~ter. Peripherie.Tastatur SVAL {25} 
Bi idschirm Bildschirm FRAt~ 
- Miktoc~ter. Per ipher ie. Bi Idschirm SVAL {28\] Tintenspritzdrucker Tintenspritzdrucker FRAME 
Mikr oc~tet. Per ipher ie ° Tintenspr i t zdrucker SVAL {3e) . 
~p { 33 } das das STOP 
{ 34 } System System FR~ 
At {34} the word expert dealing with text cohesion 
phenomena again starts running. Its input variable 
"frame" is set to "System" (system). With respect 
to #i0 the evaluation of BEFORE yields a positive 
result, since "das" which is an element of pronoun 
list occurs immediately before frame. As the 
SWEIGHT INC (f, s) which is also provided in 
App.A says that the activation weight of slot 
s of frame f gets incremented. 
404 
IN PHRASE predicate also evaluates to TRUE, the 
wh~le expression #I0 turns out to be TRUE. 
Proceeding backwards to the next frame which is 
active in the frame knowledge base search stops at 
position {28}. When more than a slngle frame 
within the same transaction may be referred to by 
word experts the following reference convention is 
applied: 
\[2i\] 
\[2ii\] 
if ANNOT - FRAME and an annotation of type 
FACT exists examine the frame corresponding 
to FACT 
if ANNOT - FRAME or ANNOT - WEXP and annota- 
tions of type SACT or SVAL exist examine f 
as frame, s as slot, and v as slot value, 
resp. according to the order of parameters 
f . s . v 
In these cases reference of word experts to the 
frame correponding to the annotation FRAME would 
cause the provision of insufficient or even false 
structural information about the context of the 
current lexlcal item, although more significant 
information actually is available in the knowledge 
sources. In the word expert considered, frame" is 
set to "Mikrocomputer" according to \[211\]. Follow- 
ing the TRUE arc of #ii expression #12 states that 
frame" (Mikroeomputer) must be a subordinate or 
instance of frame (System) which also holds TRUE. 
Thus, one gets the reading SHIFT ( System , M/k- 
rocomputer ) which says that the activation weight 
of frame (System) has to be decremented (thus 
neutralizing the default activation), while the 
activation weight of frame" (Mikrocomputer) gets 
incremented instead. Based on this re-asslgnment 
of activation weights the system is protected 
against invalid activation states, since "Mikrocom- 
purer" is referred to by "System" due to styllstl- 
cal reasons only and no indication is available 
that a real topical change in the the text is 
implied, e.g. some generalization with respect to 
the whole class of micro computers. We thus have 
an augmented entry for {34} in co-text together 
with the result of processing the remainder of \[1\]: 
No. ~KEN TYPE 
{34} system Systmo FRA~ - Mikro~ter FACT 
{36) 2 2 { 37 } Pzogr~ersprachen Pzogr~miersprache FRA~Z 
Mikroc~ ter. PrOgra~ersprache. SIL'T {39} Basic Basic F~ 
- Mikroc~uter. Pr ogrammier sprache. Basic SVAL {42} System~oft Syst~oft FRAME 
Basic. Herstel lee. SystemSoft SVAL {46} Pasta l-C~i let ~asca l-Cmmpi lee FRA~ 
Mikrocumputer. Systemso f tware, pascal-Ccmpi let SVAL Pascal 
- Mik=oc~te~. l~oqre~nierspracbe. Pascal SVAL {49} PascWare PascWaze FRA~ 
Pasta 1--Compt let. Herstel lez. pascWare SV~L Pasta 1. Hers ~eller. PasCWa re SVAL 
While expressions #1-#4 of App.A handle the usual 
kind of lexlcal cohesion sequencing in German a 
variant form of lexlcal cohesion is provided for by 
#5-#8 with reverse order of sequencing ("... die 
Tastatur fuer den Mikrorechner ..." or "... die 
Tastatur des Mikros ..."). From this outline one 
gets a slight impression of the text parsing 
capabilities inherent to word experts on the level 
of text cohesion as parsing is performed irrespec- 
tive of sentence boundaries on a primarily semantic 
level of text processing in a non-expenslve way 
(partial parsing). With respect to other kinds of 
cohesive phenomena in texts, e.g. pronominal 
anaphora, conjunction, delxls, word experts are 
available similar in structure, but adapted to 
identify corresponding phenomena. 
3.2 A Word Expert for Text Coherence 
We now examine the generation of a second type 
of reading, so-called coherence readings, concern- 
ing the structural organization of cohesive parts 
of a text. Unlike cohesion readings, coherence 
readings of that type are not issued to the frame 
knowledge base to instantlate various operations, 
but are passed over to a data repository in which 
coherence indicators of different sorts are col- 
lected continuously. A device operating on these 
coherence indicators computes text structure pat- 
terns in terms of a text graph which is the final 
result of text parsing in TOPIC. 
A text graph constructed that way is composed 
of a small set of basic coherence relations. We 
only mention here the application of further rela- 
tions due to other types of linguistic coherence 
readings (cf. HAHN 1984) as well as coherence 
readings from computation procedures based 
exclusively on configuration data from the frame 
knowledge base (HAHN/REIMER 1984). One common type 
of coherence relations is accounted for in the 
remainder of section which provides for a struc- 
tural representation of texts which is already 
well-known following DANES" 1974 distinction among 
various patterns of thematic progression: 
SPLITTING THEWS (~RIVED YHE~) SPLITTING RHEMES 
F' l =~ STR l • . . F' N ='" $~R N F' . . . F'~ 
~SCAD\]NG THEMES {LJN\[AR TMEI~,£TIZ&TSON OF RMEM~$) nESCENDJNG RMEM£$ 
F*,, 1 ~. STRI F'' 
F'N m 
F'''N "" $TRN 
Fig.l: Graphical Interpretation of Patterns of 
Thematic Progression in Texts 
The meaning of the coherence readings provided 
in App.B with respect to the construction of the 
text graph is stated below: 
SPLITTING RHEMES ( f , f" ) 
fram~ f is alpha ancestor to f" 
DESCENDING RHEMES ( f , f" , f'" ) 
frame-'f is alpha ancestor to f" & 
frame f" is alpha ancestor to f'" 
405 
CONSTANT THEME ( f , str ) 
frame f is beta ancestor=~strlng str 
SPLITTING THEMES ( f , f', str) 
fram~ f is alpha ancestor to f" & 
frame f" is beta ancestor to string str 
CASCADING THEMES ( f , f', f'' , f''" , sir ) 
fram-e f is alpha ancestor f" & 
frame f" is beta ancestor to f'" & 
frame f'" is alpha ancestor to f''" & 
frame f''" is beta ancestor to string str 
SEPARATOR ( f ) 
frame f is alpha ancestor to a separator 
symbol 
We now illustrate the operation of the word 
expert designed to handle special cases of text 
coherence (App.B) as indicated by text segment \[i\]. 
It gets started whenever a frame name has been 
identified in the text. Suppose, we have frame set 
to "Mikrocomputer" with respect to {06}. Since #i 
fails (there is no other frame" available within 
transaction {06}), evaluating #2 leads to the 
assignment of "Mikroeomputer" to frame" (with 
respect to {09}), since according to convention 
\[21i\] and to the entries of co-text frame" (Mik- 
rocomputer/{09}) occurs after frame and is 
immediately adjacent to frame (Mikrocomputer/06}); 
in addition, both, frame as well as frame', belong 
to different transactions. Thus, #2 is evaluated 
TRUE. Obviously, #3 also holds TRUE, whereas #4 
evaluates to FALSE, since frame" is annotated by 
SVAL according to the co-text Instead of SACT, as 
is required by #4. Note that only the same trans- 
action (if #I holds TRUE) or the next transaction 
(if #2 holds TRUE) is examined for appropriate 
occurrences of SACTs or SVALs. With respect to #5 
the SVAL annotation covers the following parameters 
in {09}: frame" (Mikrocomputer), slot" (Mikroprozes- 
sot) and sval" (Z-80). Proceeeding to the next 
state of the word expert (#6) we have frame (Mik- 
rocomputer) but no SVAL or SACT annotation with 
respect to {06}. Thus, @6 necessarily gets FALSE, 
so that, flnally, the reading SPLITTING THEMES 
(Mikrocomputer , Mikroprozessor , z-g0 ) --is gener- 
ated. 
A second example of the generation of a 
coherence reading starts setting frame to "RAM-l" 
at position {13} in the co-text. Evaluating #1 
leads to the asslgment of "Mikrocomputer" to 
frame', since two frames are available within the 
same transaction. Both frames being different from 
each other one has to follow the FALSE arc of #3. 
Similar to the case above, both transaction ele- 
ments in {13} are annotated by SVAL, such that #7 
as well as #9 are evaluated FALSE, thus reaching 
#11. Since frame (RAM-I) has got no slot to which 
has been assigned frame" (Mikrocomputer), #ii 
evaluates to FALSE. With respect to #13 we have 
frame" (Mikrocomputer) whose slot" (Hauptspelcher) 
has been assigned a slot value which equals frame 
(RAM-l). At #14, finally, slot (Groesse) and sval 
(48 KByte) are determined with respect to frame 
(RAM-l). The coherence reading worked out is 
stated as CASCADING THEMES ( Mikrocomputer , 
Hauptspelcher , RAM-I , Groesse , 48 KByte ). 
Completing the coherence analysis of text 
segment \[I\] at last yields the final expansion of 
co-text (note that both word experts described 
operate in parallel, as they are activated by the 
same starting criterion): 
Jo. READING pEERS 
99} SPLITrING TH~N~S 13} S PLI TTI NG--TH~Y.S 
CASCADING THE~S 181 SPLZ~Z~-_~ 
21} SPLITTING ~EMES 123} SPLICING THEMES 
25} SPLI~'r I~_THE}~S 
28} S~I~I ~G_'mE~S 
,34 } SEPARATOR 
13~} S PU~Z ~G_P,H~'ZS 
14e} sPr.I~X~c_'n~}~s 
142} ~ING_CHU~.S 
{46} SPLI~TING THEFC~S { 
} SPLITTING TH~ES 
i } ~zN='r,.,m~,~ 
Mikroeu.puter .Mikroprozessor .Z-Sg Mikr ocomputet. Hauptspeicher. RAM- 1 
Mikrocomputer. Hauptspeiche~. RAM- I .Gr oesse. 48 KByte Mikroccmputer. Bet r iebssystem. CP/M 
Mikroc~ter. Per ipher ie Mikroc~ter. Per ipher ie. Tasta tur 
Mik rockier. Per ipher ie. Bi Idschi rm Mikrocomputer. Per ipber ie. Tintenspr i t zd tucker 
Mi~r~ter Mikroc~ter. Pr ogr ammier sprache 
Mik roc~ter. Pr ogr ammiez sprache. Bas ic Mikr oc~ter, p~ogr ammler spr a~he. Bas ic. 
Hersteller. SystemSoft Mikroc~ ter. Systemsof tware. Pasta I -Cc~i let 
Mikrocumputer. programmier sptache. PaSca 1 Mikroc~ter. SyStemsoftware. Pasta l-Compi let. 
Herstel let. FascWate Mikroc~ter. p~ogr an~iersprsche. Pascal. 
Hersteller. PascWare 
The word expert Just discussed accounts for a 
single frame (here: M_Ikrocomputer) with nested 
slot values of arbitrary depth. This basic descrip- 
tion only slightly has to be changed to account for 
knowledge structures which are implicitly connected 
inthe text. Basically divergent types of coherence 
patterns are worked out by word experts operating 
on, e.g. aspectual or contrastlve coherence rela- 
tions (cf. HAHN 1984). 
4. The Generation of Text Graphs Based on 
Topic/Comment Monitoring 
The procedure of text graph generation for 
this basic type of thematic progression can be 
described as follows. After initialization by 
drawing upon the first frame entry occurring in 
co-text the text graph gets incrementally con- 
structed whenever a new coherence reading is avail- 
able in the corresponding data repository. Then, 
it has to be determined, whether its first 
parameter equals the current node of text graph 
which iselther the leaf node of the initialized 
text graph (when the procedure starts) or the leaf 
node of the toplc/comment subgraph which has pre- 
viously been attached to the text graph. If 
equality holds, the coherence reading is attached 
to this node of the graph (including some merging 
operation to exclude redundant information from the 
text graph). If equality does not hold, remaining 
siblings or ancestors (in this order) are tried, 
until a node equal to the first parameter of the 
current coherence reading is found to which the 
reading will be attached dlrectly. If no matching 
node in the text graph can be found, a new text 
graph is constructed which gets inltlallzed by the 
current coherence reading. The text graph as the 
result of parsing of the text segment \[i\] with 
respect to the coherence readings generated in 
set.3.2 is provided in App.C. 
Note that the text graph generation procedure 
allows for an interpretation of basic coherence 
readings supplied by various word experts in terms 
of compound patterns of thematic progression, e.g. 
as given by the exposition of splitting rhemes 
(DANES 1974). Nevertheless, the whole procedure 
essentially depends upon the continuous 
availability of reference topics to construct a 
406 
coherent graph. Accordingly, the ~raph generation 
procedure also operates as a kind ot topic/comment 
monitoring device. Obviously, one also has to take 
into account defective topic/c~ent patterns in 
the text under analysis. The SEPARATOR reading is 
a basic indicator of interruptions of toplc/comment 
sequencing. Its evaluation leads to the notion of 
toplc/comment islands for texts which only par- 
tially fulfill the requirements of toplc/comment 
sequencing. Further coherence readings are gener- 
ated by computations based solely on world 
knowledge indicators generating 
condensed lists of dominant concepts (lists of 
topics instead of topic graphs) (HAHN/REIMER 1984). 
5. Conclusion 
In this paper we have argued in favor of a 
word expert approach to text parsing based on the 
notions of text cohesion and text coherence. Read- 
ings word experts work out are represented in text 
graphs which illustrate the topic/comment structure 
of the underlying texts. Since these graphs repre- 
sent the texts" thematic structure they lend them- 
selves easily for abstracting purposes. Coherency 
factors of the text graphs generated, the depth of 
each text graph, the amount of actual branching as 
compared to possible branching, etc. provide overt 
assessment parameters which are intended to control 
abstracting procedures based on the toplc/comment 
structure of texts. In addition, as much effort 
will be devoted to graphical modes of system inter- 
cation, graph structures are a quite natural and 
direct medium of access to TOPIC as a text informa- 
tion system. 
ACKNOWLEDGEMENTS 
I would like to express my deep gratitude to 
U. Reimer for many valuable discussions we had on 
the word expert system of TOPIC. R. Hammwoehner 
and U. Thiel also made helpful remarks on an ear- 
lier version of this paper. 
REFERENCES 
Critz, J.T.: Frame Based Recognition of Theme 
Continuity. In: COLING 82: Proc. of the 9th 
Int. Conf. on Computational Linguistics. 
Prague: Academia, 1982, pp.71-75. 
Danes, F.: Functional Sentence Perspective and the 
Organization of the Text. In: F. Danes (ed): 
Papers on Functional Sentence Perspective. The 
Hague, Paris: Mouton, 1974, pp.106-128. 
DiJk, T.A. van: Text and Context: Explorations in 
the Semantics and PTagmatics of Discourse. 
London, New York: Longman, (1977) 1980 (a). 
DiJk, T.A. van: Macrostructures: An Interdiscipli- 
nary Study of Global Structures in Discourse, 
Interaction, and Cognition. 8/llsdale/NJ: L. 
Erlbaum, 1980 (b). 
Grimes, J.E.: Topic Levels. In: TINLAP-2: Theoreti- 
cal Issues in Natural Language Processing-2. 
New York: ACM, 1978, pp.104-108. 
Hahn, U.: Textual Expertise in Word Experts: An 
Approach to Text Parsing Based on Topic/Co---ent 
Monitoring (Extended Version). Konstanz: Univ. 
Konstanz, Informatlonswissenschaft, (May) 1984 
(- Bericht TOPIC-9/84). 
Hahn, U. & Reimer, U.: Word Expert Parsing: An 
Approach to Text Parsing with a Distributed 
Lexical Gr-,-,mr. Konstanz: Univ. Konstanz, 
Informationswissenschaft, (Nov) 1983 (- Bericht 
TOPIC-6/83). \[In: Linguistlsche Berichte, 
No.88, (Dec) 1983, pp.56-78. (in German)\] 
Hahn, U. & Reimer, U.: Computing Text Constituency: 
An Algorithmic Approach to the Generation of 
Text Graphs. Konstanz: Univ. Konstanz, lnfor- 
mationswissenschnft, (April) 1984 (- Bericht 
TOPlC-8/84)). 
Halliday, M.A.K. / Hasan, R.: Cohesion in English. 
London: Longman, 1976. 
Hobbs, J.R.: Coherence and Coreference. In: Cogni- 
tive Science 3. 1979, No.l, pp.67-90. 
Hobbs, J.R.: Towards an Understanding of Coherence 
in Discourse. In: In: W.G. Lehnert / M.H. 
Ittngle (eds): Strategies for Natural Language 
Processing. Hillsdale/NJ, London: L. Erlbaum, 
1982, pp.223-243. 
Reimer, U. & Hahn, U.: A Formal Approach to the 
Semantics of a Frame Data Model. In IJCAI-83: 
Proc. of the 8th Int. Joint Conf. on Artificial 
Intelligence. Los Altos/CA: W. Kaufmann, 1983, 
pp.337-339. 
Small, S. / Rieger, C.: Parsing and Comprehending 
with Word Experts (a Theory and its Realiza- 
tion). In: W.G. Lehnert / M.H. Itingle (eds): 
Strategies for Natural Language Processing. 
Hlllsdale/NJ: L. Erlba,-,, 1982, pp.89-147. 
407 
z 
>~ 
~'i 
o o oo 
> °° 
o ° _. ,.,~,.~ 
. . 
> 
m 
Io oo 
-.\[2_11 ~:.i 
~ o o 
~ o ~ o ~ , o ~ ° 
o > • ~< -.-.. 
i 
i~ l -- ._ 
:! io i 
i..-.i 
> 
~_ _;: ~ ": 
:'; ~.... i~ - ~.~,~ i... 
oo o o 
--- __ 
! i o~ 
-- -- i° o-. 
.'° °° 
;o oo 
oo oo 
~n 
~. 
.o 
!~i T 
~i!!,~ ~, i ii i:.i 
I_.2___" -- !:.- ~ . 
i ;\' 
i 
k. 
_.. ~. 
408 
