THE TEXTUAL DEVELOPMENT OF NON-STEREOTYPIC CONCEPTS 
Karin Haenelt and Michael K6nyves--T6th 
Integrated Publication and Information Systems Institute (IPSI) 
GMD 
Dolivostral\]e 15, D 6100 Darmstadt, Germany 
haenelt@ipsi.darmstadt.gmd.dbp.de 
koenyves@ipsi.darmstadt.gmd.dbp.de 
tel. ++49/(0)6151/875-811, fax -818 
ABSTRACT 
In this paper the text theoretical foundation of 
our text analysis system KONTEXT is described. 
The basic premise of the KONTEXT model is that 
new concepts are communicated by using the 
mechanisms of text constitution. The text model 
used assumes that the information conveyed in a 
text and the information describing its contextual 
organization can be structured into five layers 
(sentence structure, information on thematic pro- 
gression, referential structure, conceptual repre- 
sentation of the text and conceptual background 
knowledge). The text analysis component con- 
structs and traverses the information of these lay- 
ers under control of the discourse development. In 
this way, it can incrementally construct a textual 
view on knowledge, rather than only recognizing 
precoded concepts. 
1 INTRODUCTION 
In the field ofknowledge-bMed text analysis it 
has been regarded as insufficient to analyze a text 
against the background of static and stereotypic 
default assumptions for some time (cf. \[Hell- 
wig84\], \[Scha/Bruce/Polanyi87\]). By applying 
this method the pre--coded concepts are invoked 
again and again during the process of text analysis, 
regardless of the changes land the new concepts 
being constituted by the ongoing text. The func- 
tion of a text, however, is not confined to concept 
selection as in current knowledge-based applica- 
tions. In addition, textual mechanisms are used to 
operate on concepts and to compose them to actual 
contexts, i.e. to constitute (new) concepts. Textu- 
ally the contexts are established by the thematic 
and by the referential structure. Thus, new mecha- 
nisms are required which permit the textual orga- 
nization to control the creation and manipulation 
of concepts in text processing. In a way, this is to 
tie linguistic and knowledge,-based approaches to 
text processing together into a single method. 
2 THE KONTEXT MODEL 
The basic premise of the KONTEXT model is 
that the relationship of expression and concept are 
changed during a text and concepts are communi- 
cated by using the mechanisms of text constitu- 
tion. The KONTEXT model is based on the as- 
sumption that 
• the information conveyed in a text and the 
information describing its contextual orga- 
nization can be structured into five layers. 
They define the sentence structure, informa- 
tion on thematic progression, the referential 
structure, the conceptual representation of 
the text and the conceptual background 
knowledge; 
• discourse provides the basic mechanisms by 
which concepts are constructed. Discourse 
is defined as sequences of transitions be- 
tween discourse states and discourse states 
are defined by the information represented 
in the layers. 
The text analysis component constructs and 
traverses the information of these layers under 
control of the discourse development. In this way, 
it can incrementally construct a textual view on 
knowledge, rather than only recognizing pre- 
coded concepts. 
We will now describe the layers of the text repre- 
- 263 - 
sentation. In the following section we discuss the 
conception of discourse in more detail. 
2.1 LAYERS OF TEXT REPRESENTA- 
TION 
There are five layers of text representation: 
sentence structure 
thematic structure 
referential structure 
view 
background knowledge 
The lowest layer is the basis for textual com- 
munication. It is a formal representation of con- 
cepts modeling an open world and serves as back- 
ground knowledge. Since we allow for the con- 
stmction of new details and concepts, an organi- 
zation of concepts is provided which supports this 
task. Our background knowledge differs from tra- 
ditional knowledge bases in that it does not repre- 
sent a particular domain model which assigns a 
predefined and fixed structure to the concepts. It is 
rather organized around expressions and models 
their referential potential in terms of concepts. It 
resembles a meaning dictionary (like e.g. \[CO- 
BUILD87\] which is used as the basic material), 
where with expressions concepts are constituted 
and used to explain other concepts. Basically all 
concepts are of the same rank with respect to an 
open world. During discourse the concepts are ac- 
cessed via explicitly modeled perspectives on 
them \[Kunze90\] \[Melcuk87\] depending on the ac- 
tual textual development (e.g. actual state of con- 
texts, c.f. 2.2 discourse state). 
The next layer, the view, models the subject 
matter of the text using the concepts which are de- 
freed in the background knowledge. The ongoing 
discourse selects concepts from the background 
knowledge or the already existing view, reorga- 
nizes their structure and (re-)integrates them co- 
herently into the already existing view. The con- 
cepts constructed in the view during discourse 
provide the text specific perspective on the back- 
ground knowledge. 
The layer of the referentialstructure represents 
reference objects and their relationships. It drops 
details of the concept definition in accordance 
with the abstraction level of references in the text, 
and represents those complexes as units which are 
explicitly referred to by linguistic means in the 
text. 
The layer of thematic structure traces the dis- 
course development. It represents the contextual 
clustering of reference objects and traces the de- 
velopment of their clustering. This trace repre- 
sents the progression of themes and the develop- 
ment of focusing. The notion of thematic structure 
is based on the Prague School approaches to the 
thematic organization (e.g. \[Danes70\] \[Hajicov~ 
Sgal188\]\[Hajicov~i/Vrbov~2\]), which we refine 
by distinguishing the mechanisms involved in 
terms of the textual function of linguistic means 
with respect to the different layers of the text re- 
presentation. 
In our model the units of the layer of thematic 
structure are contexts. By context we understand a 
cluster of reference objects, where within a con- 
text the relationship between a reference expres- 
sion and its reference object is unequivocal. Dur- 
ing the ongoing discourse, however, this relation- 
ship and the groups of reference objects which are 
clustered together change. Whether or not lingui- 
stic means create new contexts, and which kind of 
clustering of reference objects they effect, de- 
pends on their textual function and on the state of 
discourse they operate on (examples of this are 
given below). Contexts are the units of the thema- 
tic progression. It is this grouping of reference ob- 
jects that is referred to by linguistic means imme- 
diately, that is changed, resumed, revised and tied 
up to during discourse. The thematic structure is 
the result of creating, dosing and referring to con- 
texts. The movement of contexts traces the growth 
of the view. 
It should be noted that complex progression types 
earl be constructed. This is due to the ability of 
predicative expressions to cover several themes 
by virtue of their arity and due to the textuM~ possi- 
bility of changing the structure of a contextually 
clustered concept by changing the focus when ref- 
erring to a context. Therefore hierarchical struc- 
tures as proposed by different approaches to de- 
scribing the structuring of actual texts are not suf- 
ficient to cope with the ability of natural language 
texts to constitute contextual relations (of. content 
- 264 - 
oriented structures: e.g. thematic progression 
\[Danes70\] - at least the five forms elaborated are 
hierarchical -), or discourse segmentations: e.g. 
discourse constituent units \[Polanyi88\], context 
spaces \[Reichman85\], rhetorical structures 
\[Mann/Thompson88\], superstructures and ma- 
crostructures \[vanDijk83\]). 
The sentence structure describes the linguistic 
means used in the text to express the information 
encoded in the lower layers. 
Our representation models structural relation- 
ships of text constitution principles. The back- 
ground knowledge provides concepts for the con- 
sritution of the semantic text representation 
(view). The concepts constructed in the view dur- 
ing discourse provide the text specific perspective 
on the background knowledge. 
Referential structure and thematic structure each 
cluster structures of the lower layers. Reference 
objects group conceptual definitions into units 
which can be referred to by ensuing linguistic ex- 
pressions. The sequence of thematizing defines a 
clustering of reference objects into contexts. 
Whilst the lower layers contain more static infor- 
marion which is independent of the actual se- 
quence of the textual presentation, the dynamic of 
discourse, i.e. the growth of the view during the 
ongoing discourse, is represented in the layers of 
thematic structure and sentence structure. 
The modeling allows for a text driven control of 
operations on the knowledge base and on the view, 
because the manipulations of the lower layers de- 
pend on the interpretation of the upper layer phe- 
nomena. 
We define the types of manipulations necessary in 
terms of the contribution linguistic means make to 
the layers of the text representation. The clef'tui- 
tions are placed in a text lexicon (of. the example 
given below). 
2.2 DISCOURSE 
By discourse we understand a sequence of 
state transitions which is guided by the interpreta- 
tion of linguistic means. It models textual access 
to concepts: A text does not communicate con- 
cepts at once. It rather guides sequential access 
and operations on knowledge that produce a par- 
ticular view on the concepts of the background 
knowledge. 
A discourse state is defined by the actual state 
of all the five layers of the text representation, 
which renders the actual state of the view and the 
actual access structure to view and background 
"knowledge. While the view grows during the mm- 
lysis, only a small segment of it is in the focus of 
attention at one state, and the objects which are re- 
tbrred to by linguistic expressions may change 
state by state. A discourse state provides the im- 
mediate context to which ensuing linguistic 
means can refer directly, and also previous con- 
texts. 
The transition of a discourse state is the effect 
of the interpretation of alinguistic expression. It is 
determined by the textual function of linguistic 
means. Modeling the operational semantics of lin- 
guistic means within the framework outlined 
leads to our text lexica. 
Differences of the view of two discourse states 
which are produced by a discourse state transition 
can be regarded as the semantic contribution of a 
linguistic expression. But it is important to note 
that this contribution is not only determined by the 
isolated expression, and that therefore analysis 
does not involve a static mapping from a textual 
expression to some semantic representation or 
vice versa. The contribution rather depends on the 
actual state of the preceding discourse on which 
the expression operates. Note also that there are 
expressions whose interpretation does not con- 
tribute to the growth of the view. In an actual text 
they rather are used in order to manipulate the the- 
matic organization (e.g. redirections). 
3 EXAMPLE 
With a small example we illustrate how the 
KONTEXT model works. We show how a refer- 
ence object and a concept corresponding to a ref- 
erential expression is created, and how the rela- 
tionship between expression and concept is 
changed duringthe discourse. From a sample text 
we take the following sentence and show that dis- 
course state transitions already occur while inter- 
preting this sentence textually: 
"The electronic.: dictionaries that are the goal of 
EDR will be dicaonaries of computers, by comput- 
ers, and for computers." 
We provide a selection of three discourse states 
showing view and access structure after the inter- 
pretation of "The electronic dictionaries" (figure 
- 265 - 
l), after "that are the goal of EDR" (figure 2), and 
after "will be dictionaries of computers, by com- 
puters, and for computers." (figure 3). Each figure 
then is explained by describing the textual func- 
tion of the linguistic means concerned, i.e. by de- 
scribing how they operate on previous discourse 
states and what their contribution to the layers of 
the text representation is. These definitions are 
placed in a text lexicon. Because we want to draw 
the attention to the nature of textual functions of 
linguistic means and to the possibility to distin- 
guish and to describe these functions with respect 
to the layers of the text representation, we confine 
ourselves to demonstrating this by discussing only 
those readings which lead to a solution in our ex- 
ample. 
The sentence structure used is the structure the 
PLAIN grammar \[Hellwig80\] attributes to a sen- 
tence, and for the graphical representation of our 
example we use the conventions explained in the 
legend (see below). The names of the roles in the 
view and in the background knowledge have been 
chosen for mnemotechnical reasons only, they are 
not to be confused with the conceptual modeling 
of prepositions. 
(SYNTAC'nC Ft~C'HON YN pUN ,, , ..... ex.presstonl"~ ' expt~ssion2"expt~ssion3 ') • 
...... ~ .p .... - ..... 
Thematic Structu~~~i ~l~ng 
p,p- .ip--i~,.p-l,,-p-p-. _ "" "P'" . . , i1,* ...... ~ " . 
Hemrerma, ~,rU~o ie><mfer~n ~ obiect>-referenee tclation- 
w.w 
LEGEND 
Figure 1: "The electronic dictionaries" 
"The electronic dictionaries": In the sentence 
structure the reference expression "the electronic 
dictionaries" occurs. Since so far no correspond- 
ing reference object exists, it must be created and 
conceptually defined. No previous textual context 
has been established before this state, therefore 
immediate access to the global and unspecified 
background concepts is allowed. \[COBUILD87\] 
Sentence Structure (RIX)CtI'R'M~ s~Rm~the ( ATn~c )(dictionarie.g. ~ ~. ) 
Vlew~- i i / 
of 
B~ ;ground Knowledge I icL 
Fig. 1 : Discourse state after the interpretation of 
"The electronic dictionaries .... " 
does not have an entry "electronic dictionary", 
which means that in the background knowledge no 
corresponding concept exists. 
"electronic": As an adjective, "electronic" refers 
to the reference item --elX-, which does not select 
a concept, but a conceptual structure which is used 
to extend or to modify the dominating noun's con- 
cept. In \[COBUILD87\] there are two conceptual 
aspects of "electronic", which are related to each 
other. At first "electronic" can be'a device, which 
has silicon chips and is used as a means for elec- 
tronic methods'. Secondly 'a method' can be re- 
ferred to as "electronic". 
"dictionary": Initially "dictionary" refers to the 
reference object <diet>. Conceptually "dictiona- 
ry" can refer to two aspects: It can refer to' a physi- 
cal device, which is made of paper and serves as a 
medium for recording symbols; it has been com- 
piled by an author and is used for reference pur- 
poses.' It can also refer to 'the recorded symbols as 
a work'. 
- 266 - 
"electronic dictionary": In order to find a con- 
ceptual definition of the imroduced reference ob- 
ject <eldict> we create a less specific abstract con- 
cept of dictionary. On the one hand it must be as 
specific as possible, and on the other hand it must 
be compatible with what is known conceptually 
about the referential item --elX-. 'Electronic dic- 
tionary' then is a combination of 'electronic' and 
'dictionary' leaving open e.g. the incompatible 
device 'paper'. A more specific concept of "dic- 
tionary" is introduced. This: means that from now 
on the text will not deal with "dictionaries" in gen- 
eral, but with "dictionaries" in the restricted con- 
text of "electronic dictionaries". Therefore a new 
context is opened, and in this new context "dictio- 
naries" refers to a new reference object <eldict> 
which can be the theme of the further ongoing dis- 
course. 
Figure 2: "that are the goal of EDR" 
(ILLOC (PRAED (SUBJE (REFERdi e ) ( ATI'RBel ) ... 
. diet (PRAED (SLrBJEtha0ar e c°D~ (CASg\]~ED R ))))will I~1~. ))1 
Sentence Structure ..... 
Th&matlc Str~::ture 
,. iiiiiiiiiiimi!iiiiiiiii iiiiii!iiiiiiiii!!i 
Referential Structure <el dictxel dict EDR> <goal.~ -of- <EDR> 
Background Knowledge 
Fig. 2 : Discourse state after the interpretation of 
"The electronic dictionaries that are the goal ofF.DR .... " 
"that": This relative pronoun, again, forces the 
creation of a new context. A new context is 
opened which is restricted to those "electronic die- 
tionaries" only, which "are the goal of EDR". The 
pronoun also has the function of a connexion in- 
struction \[Kallmeyer/eta177\] and effects a referen- 
tial equation of "electronic dictionaries" and what 
is predicated about "that". Both expressions and 
also "that" then refer to <eldictEDR> in this new 
context. 
"are": It is the textual function of the copula to 
form a unified context of the contexts of its subject 
("that") and its predicative complement ("the goal 
of EDR"). The unified context defines the refer- 
ence object <eldictEDR>. 
Figure 3: "will be dictionaries ofcomputers, by 
computers, and for computers" 
~LLOC (PP.AEDtStmJE (gFa~t .c ) (ATTRB~j)... 
...diet (PRAED(SUBJEtl~are(PI~oal (CASPgf EDR ) ) ) ... 
• )will be(PD(~ (C'ONJU(CA~I~g.),(C A S~e. ),and (CASl~gr e. ) ) ) ).) 
• e .... " 1 I 
!i!i ~h::::s:. : ::.,.~::::::- :::::::: :::~'.'~:::= : :::~.::: ::::: ::::?~:~:::: : :::::::? 
Referential Structure 
...... <d'mt 1 >'of_<eoml~ ! xdict2:~__ Y<eomp2><d_iet3>-_for'<_ _e__°~_YtP 3> _ 
r./ .,,F./ .,#" .f.,#J ~,# .,#.,#'.El " . dr.~ 
Vie 
Background Knowledge 
,,Fig. 3 : Discourse state after the interpretation of ,, 
.will be dictionaries of computers, by c., and for c. 
"dictionaries": The expression "dictionaries of 
computers, by computers, and for computers" re- 
fers to three reference objects <dictl>, <dict2> 
and <diet3> (namely "dictionary" in the context 
- 267 - 
of"of', "by", and "for"). The three contexts estab- 
lished for these reference objects are textually fo- 
cused on and thus provide the basis for further tex- 
tual progression. 
"will be": The copula, again, forms a unified con- 
text of the contexts of its subject and its predica- 
tive complement. This also effects a referential 
equivalence of "electronic dictionary" and "die- 
tionary". Therefore "dictionary" must at this state 
of the discourse no longer access the concept of 
"dictionary" of the background knowledge as 
freely as at the beginning of the text, when there 
was no restriction in interpretation. Now it rather 
must access the concept which meanwhile has 
been established by the text (namely 'dictionary' 
in the sense in which it has been modified and de- 
freed by 'electronic'). 
"of, by, for": make further conceptual contribu- 
tions to the concept of "electronic dictionaries" by 
refining the concept by the aspects denoted by 
"of", "by" and "for". 
4 CONCLUSION 
The model described in this contribution 
serves as a theoretical foundation of a computer 
implementation of a text analysis system. It en- 
ables us to model a discourse which can simulate 
the communication of new concepts. In this simu- 
lation concepts are constituted sequentially by 
means of state transitions which are the effect of 
the interpretation of the actual textual usage of a 
limited set of linguistic means. This technique of- 
fers the possibility to create actual concepts on the 
basis of globally and unspecifically defined con- 
cepts. Thus texts are regarded as construction in- 
structions which guide the incremental construc- 
tion of views on conceptual knowledge bases. 
5 REFERENCES 
\[COBUILD87\] Sinclair, John (ed. in chief): 
Collins COBUILD English Language Dictionary. 
London, Stuttgart: 1987. 
\[Danes70\] Danes, Frantisek: Zur linguistischen 
Analyse der Textstruktur. In: Folia Linguistica 4, 
1970, pp. 72-78 
\[Hajicov~Sgal188\] Hajicov~i, Eva; Sgall, Petr: 
Topic and Focus of a Sentence andthe Patterning 
of a Text. In: Pet~fi, J/trios S. (ed.): Text and Dis- 
course Constitution. Berlin: 1988. pp. 70-96 
\[Hajicov/t/Vrbov~i82\] HajicovA, Eva; Vrbov~l, Jar- 
ka: On the Role of the Hierarchy of Activation in 
the Process of Natural Language Understanding. 
In: Horecky, J. (ed.): Proe. of COLING 1982, pp. 
107-113 
\[Hellwig84\] Hellwig, Peter:. Grundziige einer 
Theorie des Textzusammenhangs. In: Rothkegel, 
A.; Sandig, B. (eds.): Text-Textsorten-Semantik: 
linguistische Modelle und maschinelle Anwen- 
dung. Hamburg, 1984. pp.51-59 
\[Hellwig80\] Hellwig, Peter: 
Bausteine des Deutschen. Germanistisches Semi- 
nar, Universitat Heidelberg 1980 
\[Kallmayer/eta177\] Kallmeyer, Wemer, Klein, 
Wolfgang; Meyer-Hermann, Reinhard; Netzer, 
Klaus; Siebert, Hans-Jtirgen: Lekttirekolleg zur 
Textlinguistik. Band 1: Einfiihnmg. Kronberg/ 
'IS.: 2. Aufl. 1977 (1.Aufl. 1974) 
\[Kunzeg0\] Kunze, Jtirgen: Kasusrelationen und 
Semantische Emphase. to appear in: Studia Gram- 
matica 1990 
\[Mann/Thompson87\] Mann, William C.; Thomp- 
son, Sandra A.: Rhetorical Structure Theory: A 
Theory of Text Organization. In: Livia Polanyi 
(ed.): The Structure of Discourse. Norwood, N.J.: 
1987 
\[Polanyi88\] Polanyi, Livia: A Formal Model of 
the Structure of Discourse. In: Journal of Pragma- 
tics, Vol. 12, 1988, pp. 601-638 
\[Melcuk87\] Melcuk, Igor A.; Polgu~re, Alain: A 
Formal Lexicon in the Meaning-Text Theory (or 
How to Do Lexica with Words). In: CL, Volume 
13, Numbers 3-4, July-December 1987 
\[Reichman85\] Reichman, Rachel: Getting Com- 
puters to Talk like You and Me. Cambridge, Mass. 
1985 
\[Scha/Bruce/Polanyi87\] Scha, Remko J.H.; 
Bruce, B.C.; Polanyi, Livia: Discourse Under- 
standing, in: Shapiro, S. C. (F_zl. in chief); Eckroth, 
D. (manag. editor): Encyclopedia of Artificial In- 
telligence. New York/Chicester/Brisbanefroron- 
to/Singapore: 1987, pp. 233-245 
- 268 - 
