The Computation of the Informational Status of Discourse Entities 
Soenke Ziesche 
University of Hamburg, Department of Computer Science, 
Knowledge and Language Processing Group and Doctoral Program in Cognitive Science 1 
Vogt-Koelln-Str. 30, D-22527 Hamburg, Germany 
Email: ziesche@informatik.uni-hamburg.de 
Summary 
During language production, processes of 
information structuring constitute a relevant part. 
These processes are regarded as a mapping from 
a conceptual structure to a perspective semantic 
structure. I will focus on one aspect of 
information structuring, namely the 
verbalization of the current mental 
representation of entities. For this verbalization, 
the informational status of the entities is used. 
This property is expressed by different means in 
different languages. My approach constitutes a 
cognitively oriented and highly context- 
dependent model for computing the 
informational status embedded in a concept-to- 
speech model of language production. The 
processes are illustrated by examples taken from 
the implementation. 
1. Introduction 
Vallduvi (1990) postulated that not only a 
propositional content is transferred by an 
utterance, but also an instruction for the hearer 
to extract this propositional content. By the 
process of information structuring, the speaker 
intends to create the conditions for realizing this 
utterance within the set of possible utterances 
which presumably offers the hearer the most 
efficient way for grasping the underlying 
proposition. Information structuring is highly 
context-dependent and can be divided into many 
subprocesses. One important aspect is the 
current mental representation of entities marked 
by the informational status. 
The informational status refers to an absolute 
property which depends only on contextual 
criteria of salience, like knowledge and 
consciousness. Hence, discourse entities 
permanently have a certain informational status, 
even if they are not verbalized within the current 
utterance. 
The computations on information structuring 
constitute an adjustment of parameters 
determining the realization of a felicitous 
utterance in the following processes. Concerning 
the informational status, this means that all 
verbalized entities within an utterance have to be 
marked in a way that the hearer can easily 
identify these entities. 
The putative informational status on the part of 
the hearer is manifested in various ways in 
different languages: accentuation, pronominal or 
lexical coding, or definite or indefinite marking. 
Lambrecht (1994) mentioned the problem that 
the attitudes marking the current mental 
representation are in principle a matter of degree 
whereas the linguistic possibilities of 
manifestation are partly discrete, e.g. determiner. 
Hence, for modeUing the mapping from an 
informational status to a linguistic manifestation, 
it seems useful to employ discrete taxonomies 
structuring the informational status. The most 
important taxonomies are provided by Prince 
(1981) and by Lambrecht (1994), based on 
Chafe (1987). 
In section 2 the computation of the informational 
status based on these taxonomies is described. 
This is followed by an example in section 3. 
2. The computation of the informational 
status 
The computation of the informational status is 
modelled within the framework of the 
SYNPHONICS system 2 which is particularly 
suitable for this task due to its very detailed 
representation of context. This concept-to- 
speech approach on modelling language 
production is cognitive, combining results from 
research on psycholinguistics, on theoretical 
linguistics as well as on computational 
linguistics. 
Due to psycholinguistic evidence, the properties 
modular, incremental, parallel, monotonous and 
robust are assumed for the model. It consists of 
the three central processing units 
Conceptualizer, Formulator, and Articulator (cf. 
Levelt (1989)). Recent findings in theoretical 
linguistics are taken into account by encoding 
semantic, syntactic, and phonological informa- 
tion declaratively in a special variant of HPSG 
53 
for German. In addition, it is a computational 
linguistic approach using methods suitable for 
implementation: linguistic objects are 
represented as typed feature structures and are 
processed by unification. 
The SYNPHONICS-system operates on a context 
structure 3 which contains four ola¢~es: 
• The discourse knowledge comprises the 
relevant parts of the previous discourse. 
• The perceived knowledge consists of the 
information the interlocutors perceive during 
the utterance situation besides speech 
comprehension, e.g. visual, tactile or further 
auditory perceptions. 
• The hearer knowledge contains the 
knowledge relevant for the current utterance 
which the speaker assumes the hearer 
already to have beforehand. 
• The inferrable knowledge consists of the 
relevant knowledge potentially inferrable 
from the remaining context-classes by 
means of common sense and sufficient 
knowledge of the currently spoken language. 
This means that knowledge is considered 
which is not made directly available by the 
discourse or by sense-organs, but indirectly 
by means of reasoning. 
The main data structure on the conceptual level, 
where the computation of the informational 
status takes place, are so-called "referential 
objects" ("refo") based on Habel (1986). A refo 
is modelled by a typed feature structure 
consisting, among other, of the features 
"predications" which comprise a set of 
conceptual, i.e. preverbal, predications and 
"pointer" which establishes an address used for 
reference. 
The Conceptualizer creates a bipartite output 
stream which consists of an incremental 
conceptual structure CS comprising the 
propositional content of the intended utterance 
and a contextual structure CT with the currently 
relevant parts of the contextual environment. 
Both CS and CT are composed of refos. 
Afterwards, for every refo-increment of the CS- 
stream the informational status is computed. 
The general principle of the computation of the 
informational status is the following, based on 
the current CT. 
Principle of informational status: A refo is 
assigned to a certain informational status 
depending on which - if any - contextual classes 
contain this refo. 
Constel- 
lation 
Discourse 
knowledge 
Perceived 
knowledge 
Hearer 
knowledge 
Inferrable 
knowledge 
Prince ( 1981 ) Chafe ( 1987)/ 
Lambrecht (1994) 
I brand_new, new unidentifiable 
II + inferrable accessible 
III + - unused, new inactive 
IV + + inferrable accessible 
V + i - sitevoked, evoked active 
VI - \[+ + # # 
l 
VII - + + sit_evoked, evoked active 
VIII - + + + # # 
IX i + text_evoked, evoked active 
I X l+ + # # 
XI + + - text_evoked, evoked active 
XII + - + + # # 
XIII + + - - evoked active 
XIV + + + # # 
XV + + + - evoked active 
+ + + XVI + # 
Table (1): Possible distributions of the refos within the contextual classes. 
54 
The computation of the informational status is 
implemented based on the ALE-formalism 
(Attribute Logic Engine, cf. Carpenter (1992)). 
For four contextual classes, 16 constellations are 
possible, as illustrated in table (1). Ten of these 
constellations are realistic; they are assigned to 
the informational status according to the 
taxonomy of Prince (1981) in the sixth column 
and according to the taxonomy of Lambrecht 
(1994) based on Chafe (1987) in the seventh 
column, respectively. Compared to that, the 
constellations VI), VIII), X), XII), XIV), and 
XVI) are nonsensical because it is not necessary 
to infer knowledge which is available in an 
explicit way in discourse or perceived 
knowledge. Hence, refos contained in the 
discourse or perceived knowledge can not be 
simultaneously elements of the inferrable 
knowledge. 
Afterwards, besides the descriptive information 
of the refo which should be verbalized, the 
computed informational status is handed as 
referential information to the lemma-selector, a 
submodule of the SYNPHONICS-formulator. The 
lemma-selector chooses with this information 
suitable lemmata of the lexicon's lemma- 
partition which guarantee besides descriptive 
adequacy referential adequacy of the linguistic 
increment. In German for instance, articles or 
pronouns are used for that. These structures are 
then mapped onto the content value of the 
corresponding HPSG sign. 
3. Examples and refinement 
In this section I will provide a concrete example 
for illustration followed by a refinement of the 
rules. The example concerns the informational 
status "textually evoked" (see (1)): 
(1) A book lay on the table. Mary saw it. 
A refo is assigned to the informational status 
"textually evoked" if the discourse knowledge 
contains this refo, but neither the perceived 
knowledge nor the inferrable knowledge do so, 
whereas the hearer knowledge does not have to 
be taken into account (cf. table (1): constellation 
IX) and XI)). Concerning the refo representing 
the book in the second sentence in (1), the 
Prolog clause in figure (1) is applied because the 
comparison of the pointer-values confirms these 
constellations. 
I will close with the analysis of a special 
phenomenon pointed out by Lambrecht 
(1994:80). 
(2) Mary is looking for a book. 
(2) can be uttered to refer to a specific book as 
well as to a non-specific book. The difference is 
revealed by continuing either by (3a) or by (3b): 
(3) a) She found it. 
b) She found one. 
While the anaphoric expression in (3a) is easy to 
explain because it is referred exactly to the same 
entity as in the preceding utterance in (2), the 
case in (3b) is more difficult. The anaphoric 
expression "one" refers to a concrete entity 
which is new, therefore only identifiable for the 
speaker but not for the hearer. The question 
arises why this anaphoric expression is 
nevertheless felicitous. It seems that the entity is 
activated due to the activation of the category it 
belongs to. In this non-specific case, the 
category "book" is activated in (2) and therefore 
obviously any instances of this category are 
activated, too. This means that the informational 
status "textually evoked" has to be divided into 
two cases "textually evoked/specific" and 
"textually evoked/non-specific". 
My approach is able to handle this phenomenon 
by the following clause in figure (2) whereas the 
clause in figure (1) strictly speaking determines 
the informational status "textually 
evoked/specific". A refo is assigned to the 
informational status "textually evoked/non- 
informational_status(object_refo, context, info_status). 
informational_status(pointer:A, (discourse_knowledge:B, perceived_knowledge:C, 
inferrable_knowledge:D), text_evoked) if 
element(pomter:A,B), 
not element(pointer:A,C), 
not element(pointer:A,D). 
Fig. (1): prolog clause concerning the informational status "textually evoked" 
55 
informational_status(object_refo, context, info_status). 
informational_status(predications: A, (discourse_knowledge:C, perceived_knowledge:D, 
inferrable_knowledge:E), text_evoked/non-specific) if 
element((predications: A, pointer: var),C), 
not element((predications: A, pointer: var),D), 
not element((predications: A, pointer: var),E). 
Fig. (2): prolog clause concerning the informational status "textually evoked/non-specific" 
specific" if it is an instance of a category which 
is only given by the discourse knowledge, so far. 
By the clause in figure (2), it is checked whether 
only the discourse knowledge contains a refo 
which has the same predication set as the refo 
which should be verbalized but an 
underspecified pointer Cvar"). This refo 
represents the category with this predication set 
(e.g. "book"), but no special instance. 
4. Conclusion 
I have introduced a cognitively oriented 
approach for modelling a phenomenon within 
the processes of information structuring, namely 
the informational status of discourse entities. 
Information structuring creates conditions for 
producing the most felicitous utterance within 
the set of all possible utterances. One important 
parameter of information structuring is the 
informational status. This value is expressed by 
different means in different languages for 
marking verbalized entities in a felicitous way, 
i.e. so that the hearer gets the intended reference. 
Hence, a precise computation of the 
informational status is a crucial subprocess 
within the whole process of utterance 
production. Accordingly, I have described and 
illustrated an implemented algorithm based on a 
detailed representation of context. 

References 
Abb, B.; Gfinther, C.; Herweg, M; Lebeth, 
K.; Maienborn, C.; Schopp, A. (1995). 
Incremental syntactic and phonological encoding 
- an outline of the SYNPHONICS-Formulator. 
In: G. Adorni & M. Zock (eds.): Trends in 
natural language generation: an artificial 
intelligence perspektive. Berlin: Springer. 
Carpenter, B. (1992). The logic of typed feature 
structures. Cambridge, Cambridge University 
Press. 
Chafe, W.L. (1987). Cognitive constraints on 
information flow. In: R.S. Tomlin (ed.): 
Coherence and grounding in discourse. 
Amsterdam/Philadelphia: John Benjamins, 21- 
51. 
Giinther, C., A. Schopp, S. Ziesche (1995). 
Incremental computation of information 
structure and its empirical foundation. In: 
Proceedings of Fifth European Workshop on 
Natural Language Generation, Leiden, 181-205. 
Habel, C. (1986). Prinzipien der Referential#i#. 
Berlin: Springer. 
Lambrecht, K. (1994). Information structure 
and sentence form. Cambridge: CUP. 
Levelt, W.J. (1989). Speaking: from intention to 
articulation. Cambridge, Mass.: MIT Press. 
Prince, E.F. (1981). Toward a taxonomy of 
given/new information. In P. Cole (ed.): Radical 
Pragmatics. New York: Academic Press, 223- 
255. 
Vallduvi, E. (1990). The informational 
component. PhD Thesis, University of 
Pennsylvania. 
Ziesche, S. (1995). Formalization of context 
within SYNPHONICS and computations based 
on it. In: Proceedings of the IJCAl-95-Workshop 
"Context in Natural Language Processing", 
Montrral, 171-179. 
