Hierarchy o_ff Sallenee an__dd Discourse ~alysls and Production 
Eva HaJl~ov~, Petr Kubon and ¥1a~Islav Kubon 
NA~ - llng~i8%les 
Charles Unlverslty 
Xalos%ransk~ m~.25 
CS-118 O0 Prague 
Abstract 
The hierarchy of salience of the items 
of the knowledge assumed by the speaker to 
be shared by him and by the hearer constitu- 
tes one aspect of a dynamic account of 
discourse (Sect. I). It is claimed that a 
representation of this hierarchy is a good 
support for discourse analysis (reference 
assignement, Sect. 2) and for discourse pro- 
duction (pronominallzatlon, definite des- 
cription, Sect. 3). 
1.1 In studying communication, it must 
be distinguished between the speaker's own 
image, the hearer's image and the assumpt- 
ions the speaker has made about the hearer's 
image of the world. In the very process of 
discourse, the image of the world undergo 
changes of different kinds: new objects, 
relations, etc. are added to the repertoirs 
on the basis of the content of what has just 
been said, the universe of discourse may be 
restricted in that a certain of phenome:,a of 
a particular kind is marked as relevant for 
further discourse whereas the other ele- 
ments are disregarded, or the salience 
(activation, foregrounding) of the items is 
changed in the sense of being easily access- 
ible in memory (see Sgall, Haji~ov~ and Pa- 
nevov~, 1986, p. 54f). I% has been shown 
(Haji~ov~ and Vrhov~, 1982; HajiSov~, 1987) 
that the changes of salience are dependent 
to a great extent on the topic/focus articu- 
lation of the utterance. As a matter of 
fact, most algorithms for anaphora resolu- 
tion work with the notion of salience 
(el., e.g., Hobbs, 1976; Sidner, 1979; 
Brennan, Friedman and Pollard, 1989); how- 
ever, while in most of these approaches the 
degrees of salience are given only syntact- 
ically, the hierarchy of activation in our 
mode\] is elso determined by the toplc/focus 
144 
articulation of the sentence. 
Leaving aside the distinction between 
the contextually hound and non-bound 
elements of the utterance within Its topic 
and focus parts (for the relevance of con- 
textual boundness in this respect, see HaJi- 
~ov~, Hoskovec and Sgall, in press), these 
relationships can be summarized as follows: 
(1) the items referred to in the focus 
of the utterance be it by a noun phrase or 
by a stressed pronoun receive the highest 
degree of salience; 
(ii) the items referred to by a noun 
phrase in the topic part of the utterance 
are activated one degree less than the items 
referred to in the focus part; 
(ill) a pronominal reference to an item 
in the topic part of the utterance keeps the 
activation unchanged; 
(iv) the activation of the items not 
mentioned in the given utterance fades away; 
the fading is steeper if the given item was 
the most activated item after the preceding 
utterance and less steep if the given item 
preserved high salience for some of the 
previous utterances, being mentioned in its 
topic. 
We do not attempt to cover "VP-anaphora 
since in the model of the stock of knowledge 
assumed by the speaker to be shared by him 
and the hearer (SSKK) we work - for the time 
being - only with mental images of ob- 
jects, rather than wlth those of events. 
1.2 Several thresholds can be es- 
tablished on the hierarchical structure of 
the activated part of SSK; at \].east two of 
them are important in the context of the 
present paper. One threshold characterizes 
those items of the SSK that are activated 
to such an extent that they can be referred 
to in the topic part of the following 
utterance; this is to say that the salience 
of these items is large enough for the 
hearer to identify easily their referents° 
The second (higher) threshold delimits that 
part of SSK the items of which can be re- 
ferred to by pronouns; their salience is 
assumed by the hearer to be large enough for 
the speaker to assign the reference in a 
straightforward way. 
1.3 The representation of the discourse 
in terms of the hierarchy of activation of 
the elements of SSK suggests itself to be 
used for a split up of the discourse into 
segments; the segments correspond to those 
parts of the discourse for which there is a 
characteristic grouping of most activated 
items. These most activated items in each 
segment can then be regarded as the "topic" 
of the given segment; items which may be 
understood as the "topic(s)" of the dis- 
course can then be computed on the base of 
the 'topic(s)" of the segments. 
The ideas outlined in Sect. 2 and 3 
will be illustrated by an analysis of 
multifarious examples; the results of those 
sections will serve as a theoretical base 
for further practical applications in 
various systems. 
There are two competitors in the first 
sentence, both NP's. The antecedent of the 
ellipsis in the second sentence is the 
subject of the first sentence. 
The antecedents of relative pronouns are 
easy to compute, too. By a thorough inves- 
tigation of a large amount of technical 
texts we found that relative pronouns almost 
certainly (about 90-95 %) refer to the head 
of the closest preceding NP which has 
appropriate morphematic categories (gender, 
number,etc.): 
"Pou~iv&me disk z polykarbonAtu, ~ Jsme 
pgedem oSistili." 
"We use a ~ of a polycarbonate which 
we've cleaned before." 
An important role is played also by a 
tendency to keep the syntactic dependency 
hierarchy in referring - the antecedent of a 
pronoun in the subordinate clause is to be 
found on the higher or equal layer of the 
hierarchy: 
"Pro r~, u~ivan~ syst@mem s rutinou, kter& 
mu pom&h~, m~e..." 
"The ro~ used by the system with the 
routine which helps it, can..." 
Here, the pronoun "it" refers to "program" 
rather than to "system". 
2.I SSK can help to solve the referent 
assignment in discourse analysis. If we want 
to show a most suitable way of application 
of SSK in the context of other usual methods 
of solving this problem, we should remind 
first , f all the assignment based on syn- 
tactic relations: 
In Czech coordinated clauses the subject 
of the second (or third, fourth etc.) clause 
is usually deleted. Then it is (more or 
less) unambiguously understood to be the 
same as the subject of the first clause. The 
same holds for two successive sentences , 
e.g.: 
"Digltallzace je velml popul&rnl trend. 
(6) St&v& se symbolem kvalitnlho z&znamu pro 
poslucha~e." 
"Di~italization is a very popular trend. 
Tt becomes the symbol of the recording 
quality for listeners." 
2.2 When working within the framework 
of the functional generative description 
(see Sgall, HaJi~ov& and Panevov&, 1986), 
the solution of anaphora can be supported by 
the topic-focus articulation and the hierar- 
chy of activation of the items of the SSK. 
A good help for finding the pronoun's 
antecedent is the form of the pronoun used 
in the text. The strong form of a pronoun 
(ten, tento = this; sebe = himself;...) re- 
fers - in technical texts almost unamblgu- 
ously - to the focus of the preceding 
sentence, the weak (unstressed) form implies 
referring preferably to the topic: 
"Nejslab~Im 5l&nkem v cel~m ~et~zu je vst~. 
Ten Je pPi~inou mnoha probl~m8." 
"The poorest member in the whole chain is 
the i__n_put. This causes a lot of problems." 
/the strong form "this (ten)" refers to 
"input"/; 
145 
"S~st@m vyvolAv~ rekursivnl program. M6~ete 
h__q u~it,..." 
"The system calls a recursive program. You 
can use it ,... " 
/ "it" refers to "system" in primary case/; 
The antecedent is not "DAT players", which 
can be computed only on the basis of factual 
knowledge - if you know DAT's are newer than 
CD's. 
These strategies are relatively reliable 
(80-85%) and can be used in discourse ana- 
lysis. 
2.3 When we take into account also 
other aspects of the role of SSK in dis- 
course analysis, we can base the algorithm 
of reference assignment on the following 
strategies: 
(I) if the subject of the sentence has 
a null form, the subject of the preceding 
clause is referred to, as long as the gram- 
matical agreement is preserved; 
(2) in case of a relative pronoun we 
try to find the head of the closest preced- 
ing noun phrase as the antecedent; 
(3) if the referring expression is a 
' weak pronoun, we look for the antecedent in 
the topic of the preceding clause, in case 
of a strong pronoun (or "adjective pronoun" 
in the noun phrase as "this man") we in- 
vestigate the focus; 
(4) if there are more competitors after 
step (3) or if none of the steps (I) through 
(3) can be used, we apply SSK in the form of 
a list of NP's from the preceding text (from 
the beginning of the actual paragraph) with 
their respective degrees of activity and 
choose the most activated item with the 
congruent morphological categories. If we 
cannot find an item "activated enough" (the 
concrete value, or the difference of values, 
is to be determined independently on the way 
the activation is evaluated), we prefer 
leaving the anaphora unresolved in order to 
prevent wrong solutions of the "global" 
references (to a preceding clause, sentence, 
an action identified by a verb, a coordi- 
nation of items etc.) or references which 
cannot be solved without the use of sem- 
antics, e.g.: 
3. The discourse production has more 
freedom than analysis, because the speaker 
can choose the means while describing his 
ideas. Of course, he has to take care of the 
hearer to enable him to interpret the text 
easily and, if possible, unambiguously; at 
the same time, he should not repeat unneces- 
sarily definite NP's. The main criterion in 
the speaker's choice between the use of a 
pronoun and a definite NP may be the actual 
state of SSE. We deal with technical texts 
only but we believe the basic ideas hold for 
other types of texts as well. 
3.1 When producing a sentence of a 
continuous text, the speaker can use three 
types of referring expressions - weak pro- 
nouns, strong pronouns (including the demon- 
strative and relative ones) and more or less 
complex definite expressions (compare "John" 
with "the boy who played with a ball yester- 
day as I have told you..."). Depending on 
the actual state of SSK he chooses the 
relatively "weakest" means (from a weak 
pronoun to a complex description) the use of 
which enables the hearer to find the refer- 
ent correctly. Two aspects of SSK are im- 
portant in this choice: 
(a) the degree of activation /da(0) 
of the object (referent) in SSK - an im- 
portant role is played by the minimal degree 
of activation (MIN), i.e., the threshold 
below which it is not possible to refer to 
objects by pronouns (see 1.2); 
(b) the existence of "competitors" - 
i.e. objects differing in activation only by 
degree ~ (see Haji~ov~ and Vrbov~, 1982) and 
having the same morphological categories. 
"PPehr&va6e PAT jsou mnohem dra~Ni nee CD 
pPehr&va~e. Toto nov@ za~izenl jeNt@ v@robci 
nebylo pPijato." 
"The PAT la~ are much more expensive 
than CD players. This new device is not yet 
accepted by producers." 
3.2 We claim that on the background of 
these two aspects we can find the following 
four cases involved in discourse production: 
(i) da(O)~ MIN (as a special case this 
holds for "new objects"): 
146 3 
In a technical text, the speaker prefers to 
use a definite NP. The degree of its com~ 
plexlty depends on the presence of possible 
competitors. 
"Vstupnl data (0) se m~nl pomocl programu 
D-TYPE. Zpo6&tku vyvol&vg subrutinu D-START, 
kter& ukl&d& data (0) do pam~ti." 
"The .in~ data (0) are changed by the 
D-TYPE program. In the beginning it calls 
the D.-START subroutine which loads the data 
(0) into the memory."; 
(I) ... ~ (02 ) Nemajl stejnou scuborovou 
strukturu." 
... The Z (02 ) haven't the same file 
structure." 
(2) ... ~ (0 2 ) Pou~fva~ ~ (01 ) k ..." 
... The X (0 2 ) use them (01 ) for..."; 
(b) the expression referring to 01 has the 
position of subject in C: 
When referring to 01, a strong pronoun has 
to be used, in case of 02 a weak pronoun 
will do. 
(ii) da(O)>MIN and the object 0 has no 
competitor or the competitor is 
"far enough": 
A weak pronoun can be used in this case. 
"Vstupnl data (0) se m~nl pomoe~ programu 
D-TYPE. M~nl ~_ (0) na speei&in~ typ." 
"The input data (0) are changed by the 
D-TYPE program. It transforms them (0) into 
a special type."; 
(iii) da(O1)>MIN , the object 01 has a 
competitor 02, none of them hav- 
ing the maximum degree of activ- 
ation (MA_~X) : 
In this case a pronoun does not help. A 
definite NP (at least for one object) has to 
be used. 
"Oba ~ (0 2 ) sdilejl n~kter@ souborz 
(01). ~ (01 ) ~ (02 ) pomAhajl k ..." 
"Both systems (02) share some files (01). 
Those (01 ) help them (02 ) to ..."; 
(s) cases (a),(b) do not hold: 
In this situation the competition cannot be 
"solved" by syntactic means. The solution of 
the problem is the same as in (iii). 
"Oba ~ (02 ) sdileJJ n@kter@ soubori 
(01). Program&to~i u~Ivaji ~(02)/sou- 
bo_~(01) k ..." 
"Both ~ (02 ) share some files (01). 
The programmers use the s st~(O2)/files 
(01 ) to..." 
"Oba ~ st~ (01) se \]i~i v utilit&eh (02). 
Rozdll se proJevl, pokud se pokuslme s~vst@- 
m x(01)/.~(O 2) odstartovat." 
"Both s~stems (01 ) differ in utilities (02). 
The difference will take place if we try to 
start the s st_~(O1)/utillties(02)." ; 
(iv) da(O1)=MAX , da(O2)=MAX-1 and 01 
competes with 02: 
This is the most dlffleult situation. We can 
divide it into three subcases by the way 
referring expressions are used in the fol- 
lowing clause (sentence) C: 
3.3 As we have already stated, our 
study is the first step on the way to a 
complex account of the impact of SSK in dis- 
course production. To handle the interplay 
between pronouns and definite NP's in all 
details, one has to state the relevant dif- 
ferences in the activation of competitors 
(in various types of sentences), to consider 
the possibility of the marked use of strong 
pronouns and definite NP's and many other 
problems. 
(a) the expression referring to 02 has the 
position of subject in C: 
Here we face the "subject-preserving tenden- 
cy", which is very common in continuous 
texts. This helps to avoid the possible 
ambiguity between competitors so that weak 
pronouns can refer to both objects (01 ,02). 
"Oba ~Kst6_~ (02 ) se li~i v utilitAch (01). 
"Both s~ystems (02 ) differ in utilities (01). 

References 

Brennan S.E. , Friedman M.W. and Pollard C.J. 
(1989), A Centering Approach to Pro- 
nouns, manuscript 

Carter D.N. (1985), Common Sense Inference 
in a Focus-guided Anaphor Resolver. 
In: Journal of Semantics 4, 237-246. 

HaJi~ov~ E. (Ig87), Focussing - A Meeting 
Point of Linguistics and Artificial 
Intelligence. In: Artificial Intellig- 
ence II, ed. by Ph.Jorrand and V.Sgurev 

HaJiSov~ E. , Hoskovec T. and Sgall P. (in 
press), Discourse Modelling Based on 
Hierarchy of Salience; to appear in 
Prague Studies in Mathematical Lin- 
guistics 11 

HaJi~ov~ E. and Vrbov~ J. (1982), On the 
Role of the Hierarchy of Activation in 
the Process of Natural Language Under- 
standing. In: Collng 82, ed. by J.Ho- 
reckS, Amsterdam:North Holland, 107-113 

Hobbs J.R. (1976), Pronoun Resolution. 
In: Techn.Rep. 76-I, Dept. of Computer 
Science, City College, CUNY 

Sgall P. , HaJi~ov~ E. and Panevov& J. 
(1986), The Meaning of the Sentence 
in Its Semantic and Pragmatic Aspects. 
Prague:Academia - Dordrecht:Reidel 

Sidner C.L. (1979), Towards a Computational 
Theory of Definite Anaphera Comprehen- 
sion in English Discourse. In: TR-537, 
M.I.T Artificial Intelligence Labor- 
atory 

Walker M.A. (1989), Evaluating Discourse 
Processing Algorithms, manuscript 
