FRAMEWORK FOR A MODEL OF DIALOGUE 
Ronan REILLY 
Educational Research Centre 
St Patrick's College, Dublin 
Giacomo FERRARI 
Department of Linguistics 
University of Pisa 
Irina PRODANOF 
Institute for Computational Linguistics - CNR 
Pisa 
I INTRODUCTION 
In this paper we present a general model of 
cmmnunication applied to the special case of 
dialogue. Our broad perspective aims to account for 
the many facets of human dialogue within a singl~ 
theoretical framework. In particular, our project's 
aim of incorporating relevant non-verbal 
communicative acts from the person-machine interface 
make it essential that the description of 
communication be sufficiently broad. 
The model described here takes as its starting point 
the communicative utterance or act. It considers the 
higher-order structures into which communicative acts 
may be incorporated, but does not detail their 
internal composition. It is in this sense that the 
model provides a framewerk for the formal treatment 
of dialogue. 
2 COMPONENTS OF THE MODEL 
A full description of the adopted dialogue model has 
been given in Egan, Ferrari, Harper, et al. (1987). 
It relies on a double deserlption of dialogue: a 
syntactic analysis of dialogue structure and a 
semantic-pragmatic description of the communication 
context. The basic units are: 
- meaningful expression (ME): Any physical act 
carrying a non-contextual meaning; 
- communicative act (CAct): An instance of ME 
issued by a specific "issuer" and received by a 
specific "receiver"; 
- communicative situation (CS): The CAct together 
with all the relevant facets. 
- communicative situation structure (CSS): A larger 
aggregation of "CSs that provide a bridge into the 
intentional component of the dialogue model. 
Each of these components is discussed in more detail 
below. 
2.1 Communicative Acts and Dialogue Structure 
The syntactic component of the dialogue model relies 
on the fact that, if we examine a dialogue or any 
other communicative exchange, it is possible to 
observe in the sequence of communicative acts, sub- 
sequences which follow regular patterns. These 
patterns can be catalogued in a form which expresses 
their significant regularities. This approach leads 
to a descriptive method very similar to the formal 
description of language in terms of a vocabulary of 
terminal symbols (the communicative acts), a 
vocabulary of auxiliary symbols (a collection of 
labels), and a set of productions (discourse 
patterns). Within the definition of a communicative 
act, provision is made for gestural information 
accompanying an utterance, such as a deictic gesture 
involving a mouse or some other pointing device (in 
the context of person-machine interaction). 
The idea of treating discourse segments like 
phrases in a sentence is not new (cf. Burton, 1981). 
However, the nature of the entities involved is 
rarely fully clarified. In Christie, Egan, Ferrari, 
et al. (1985), a dialogue classification system was 
presented, based on the system of classification of 
Burton (1981), It consisted of a set of functional 
labels divided into the following five hierarchical 
levels, from lower to higher, 
acts: {marker, summons, elicitation, 
reply .... }, 
moves: (delineating, sketching .... }, 
exchanges: (explicit, boundary, 
conversational,....}, 
transactions: (exchange,...} 
interactions: (transaction .... } 
The labels at the act level are defined in terms of 
functional labels assigned to expressions, such as 
"starter", structurally realized by a statement, a 
question, or a command; "informative" structurally 
realized by a statement; "elicltation", structurally 
realized by a question. These, together with their 
functional definitions, represent a closed set of 
elements. The labels at a higher level are all 
defined in terms of patterns of labels of the 
immediately lower level. This set of rules may be 
regarded as the set of productions, which generates 
communications. In this way, a dialogue/ 
communication is adequately described in terms of a 
formal generative grammar. An ATN-like grammar of 
dialogue in these terms has been described in Egan, 
Forrest, Gardiner et al. (1986) and Reilly (in 
press). 
2.2 ~ommunicative Situations 
The semantic-pragmatlc description relies on the 
notion of "communicative situation" (CS). A CS is a 
way of representing the communicative exchange 
together with its context. It consists of facets, 
which are aspects of the CS that occur with a certain 
regularlty in all CSs of a given sort. Facets may be 
formally conceived of as "sorted regularities" in the 
scene where communication takes place, therefore a CS 
may be described as 
CS w {fs' ft .... } 
where the subscripts identify the sort of the facet. 
It is relatively easy to identify the sort of the 
more frequent regularities, such as who the issuer is 
(fi) , who the receiver is (f), etc., and to consider 
these as constituent elements of a CS, around which 
other facets become, from time to time, relevant. 
540 
Situation Semantics has been shown (ef. Egan, 
Ferrari, Harper, et al, 1987) to have some advantages 
for the representation and the treatment of a CS, 
provided that certain modifications and extensions to 
the original description of a discourse situation are 
carried out. In communication, since more than one, 
and often more than two, participants are involved, 
each with different attunements to the CS and 
different perceptions of what in Situation Semantics 
is called the speaker's connections, more than one 
classification of the same CS is possible. In the 
best case, where participants understand a CS in the 
same way, communication is successful, otherwise some 
failure occurs. In general we can assume, that 
participants in a communicative event are able first 
to classify, and then understand situations on the 
basis of the situation types they share. In the 
spirit of SS, we assume ~lat these CS-types are the 
description of regularities observed in actual 
eo~tunicatiens. An important consequence is that a 
new notion, relevance or relevant ~-~M, is 
established in terms of the more frequently observed 
regularities. 
We can, then, describe the facets of the 
conununicative situation in terms of properties of 
that situation, where the notion of relevance 
intervenes at two levels. At the first level, the 
set of properties is not defined a priori. Different 
properties are relevant to the interpretation of 
different utterances in different situations. Some 
of these arc involved more frequently than others in 
the process of understanding, and may be considered 
,tore fundamental than others to a CS. These seem to 
be the roles of issuer, receiver, location, 
colm~unication mode, illocution. The communication 
mode, i.e. , whether co~mtnication happens face-to- 
face, by telephone, or in any other way, may affect 
both the form of the message and the referring 
expressions. By illoeution, the traditional 
illocutioeary force is meant, although a more fine~ 
grained classification of speech acts is intended 
(cf. Christie, Egan, Ferrari, et el, 1985). Also, 
other facets of a CS may occasionally become relevant 
to the understanding of an utterance. 
At the second level, each property of a CS is taken 
to be a role participating in an intersecting set of 
regularities which qualify its sort. Thus, the 
property: 
\[xl<<l , saying, x, ~>,l>\] 
describes some indeterminate x saying ~, and 
participating in those situations where it is 
"regular" (nomie) that some x says ~. 
By further specification we can assume that 
\[a-touristl<<l , saying, a-tourist, e>,l>\] 
participates in those situations in which x is of 
type a-tourlst. In Barwise and Perry's (\].985) 
notation this would be given as: 
\[x I In S: a-tourist, x, yes\] 
where S is the set of situation-types in which a 
tourist is involved. 
Both properties and types classify real objects that 
become lelevant to a discourse situation in 
accordance with the relations participants are 
attuned to. On the basis of this notion of 
relevance, it is possible to define a large set of 
types of properties which may or may not appear in 
one or the other CS. A receiver makes use of these 
classificatory devices to classify and understand any 
speeifle CS with which he or she is presented. 
Focus space k.~ Speaker 
Display 
OAct 
Act type 
Act structure 
\ 
Figure 1" Structural components of the model 
2.3 Cotmmunicative Situation Structures 
The Conmtunicative Situation Structure (CSS) is 
equivalent in level of analysis to the discourse 
segment of the Grosz and Sidner (1986) model. The 
three components of the CSS (see Figure I) are the 
conm~nicative act component (CAct), the communicative 
situation component (CS), and certain properties 
specific to the CSS itself. A CSS can consist of a 
number of CSs, and these in turn can consist of a 
number of CAets. The nature of CActs and CSs has 
already been discussed above. 
A number of factors serve to distinguish one 
communicative situation from another. These can 
involve any change in the context of the dialogue; 
for example, a change in location or a change of 
speaker, in the ease of person-mach~le communication 
it is most likely to involve a change in the speaker 
or a change in some aspect of the computer's visual 
display. 
A number of eo~nunicative situations go to make up a 
CSS. What distinguishes one CSS from another is a 
change in the purpose of the CSS. The CSS is also 
the repository of information about what entities in 
the dialogue arc currently in focus. Thi.s 
information is used in the reso\].ution of anaphora. 
2.4 Structural Relationsh~p~ 
,A CSS can be related to another CSS in a limited way. 
The relationship can only be hierarchical, and it 
represents a route through which information relating 
to the focus of attention can be transmitted. If tile 
focus of attention is on one CSS, definite noun 
phrases and anaphora in general can be resolved 
either from entities in focus within the current CSS 
or from the focus space of a CSS that is connected to 
the current one. 
Figure 2 represents a structured collection of CSSs. 
As can be seen, they consist of a number of tree 
fragments, rather than one large tree. Such a 
situation can occur if the purpose of a dialogue is 
to achieve a number of distinct goals, which cannot 
be integrated under a dominating CSS. 
3 PRAGMATIC DIMENSIONS 
3.1 Attentional State 
The disembodied arrow in Figure 2 represents the 
current focus of attention. The focus of attention 
sets bounds on what are valid targets for anaphorie 
reference within a CSS. This focus shifts 
automatically as a new CSS is created. It can also 
be shifted by one or other of the dialogue 
participants explicitly requesting a shift of focus 
back to a previous topic in the dialogue. However, 
there is a constraint put on this shift. When moving 
from one tree fragment to another, the focus of 
iattention can only shift to the top-most node of the 
541 
f 
Figure 2: A set of cornrnurficative situation structures 
target tree. From there, it may traverse the 
subordinate nodes of the tree to locate the 
apprdpriate CSS. This restriction reflects the fact 
that when a dialogue participant returns to a 
previously active topic in the dialogue, he or she 
tends to proceed from the general to the specific 
aspect of that topic. Traversal of the CSS tree from 
top to bottom represents such a transition. 
The component of the model operated upon by the 
attentional mechanism is the focus space. This 
consists of a list of items that we call discourse 
objects. Tile entities on the list can either have 
properties in their own right, or can Inberit them 
f\]:'om higher up in a classification hierarchy. The 
reason for having highly structured objects in the 
foe~.~s space, is to allow for the resolution of 
anaphoric rcferenee.~ of the following type (after 
S~dner, 1.9/9): 
A: if: saw John's Irish Wolfhound yesterday 
B: Yes. They're really big dogs. 
!i~ (J',) the phrase ljh_eJ_~ge does ;lot refer back to any 
~pecifio entity mentioned in (A), but rather to t;hc 
eJ::J.ss of dogs of whieh John's is a member. In order 
sL.toeessfu\].ly to resolve this reference, knowledge 
~<:eds to be available to the resolution process 
concerning the class of entities to which the 
speeific irish Wolfhound mentioned belongs. The way 
this is achieved in tile model described here, is to 
a\].\[low the entities in the focus space to inherit 
properties via a classification hierarchy. 
3.2 Intentional Structure 
As has been pointed out in the description of the 
dialogue structures, the topmost element of the 
structural hierarchy (the CSS) contains a pointer 
into a structure representing the purpose of the CSS. 
Crosz and Sidner refer the set of such CSS purpose.'.~ 
as the intentional structure of the dialogue. In 
essence the CSS purposes arc elements in the plan 
underlying the dialogue. In the case of a person- 
machine dialogue system, they are the actions that 
tile user wishes the system to perform. There are t~;o 
relationships that can hold between elements of the 
intentional structure and these are dominance and 
satisfaetion-@rpcedenee. These represent goal/sub- 
goal and pre-eondltion relationships, respectively. 
The hierarchy of intentional elements is more or less 
isomorphic to the dialogue structure, as can be seen 
in Figure 3. Here, tlle dialogue st1:ucture i.£ 
~:epz'esented by white boxes and the underlying 
intentional structure by shaded boxes. Also note 
that the intentional structure may be expanded by an 
inferential process, without there being a 
col:responding node in the dialogue structure. 
The specif_ie details of the intentional structure is 
dependent on the dialogue domain, unlike tbe dialogue 
542 
structure. In the following example of an 
application of the model, the domain is that of 
database interaction with the user performing the 
specific task of tabulating data about students' a~es 
and courses. Each intentional component represents 
an action of tabulation, and the place that the 
action has in the intentional hierarchy is determined 
by the complexity of the table requested (or 
inferred). 
/ 
. CSS2 
~.~.'~ ~ 
Figure 3: The relationship between dialogue 
and intentional structure. 
4 A SAMPLE APPLICATION 
The following dialogue (except $8) was collected as 
part of a corpus of simulated person~maehine 
dialogues collected for the studies described in 
Egan~ Harris, Harper, and Reilly (1986). $8 is 
inserted to illustrate how an inferred intention can 
he used by the system to direct the dialogue. 
I\]i: How marly students, both male and female, 
under 16 or younger in the year degree 
course'/ 
$2: Tbere are no students of that age group in 
the Cell.ego. 
U3: Again in the 3 year degree course, how many 
male and female students in the following 
age groups: 19 20 21 22 23 25 or older 
$4: Here is the table. 
US: Total\[ number of both male and female 
students in this course of study 
$6: 153 males and 559 fema\].es. 
U7: Please supply a breakdown of both male and 
female students in the graduate course. 
$8: Do you wish to see a complete sex by age by 
course breakdown? 
Figure 4 illustrates the unfolding of both the 
dialogue and intentional structures (the numbers in 
tile boxes correspond to utterances). The intentional 
structure underlying $8 is inferred on the basis that 
the user has asked for the same breakdown for two 
courses, therefore he or she may wish to have a 
three-way breakdown for all courses. This inference 
then gives rise to utterance $8, which is 
incorporated into the dialogue structure. The left 
of Figure 4 represents the state of the dialogue and 
intentional structures up to and including utterance 
Figure 4: A Structural analysis of the sample 
, Dialogue 
U7. The right: of the figure represents the 
structure:; after $8. 
In U5, the reference to all unspecified course 
(underlined) requires that a referent be found. The 
bi-direetional links in the discourse structure allow 
information from the focus spaces of the connected 
nodes to be accessed in the resolution process. 
Thus, the anaphoric reference in U5 can be reso\].ved 
by accessing the focus space of utterances 3 antl 4. 
Note that: the small disembodied arrows in Figure 4 
indleate t:he current attentional state of the 
dialogue. 
5 CONCLUSION 
The dialogue taode\], out\].ined above is under-specified 
in a nnmber of import:ant aspects. For exmuple, 140 
algorithmic description has been provided that can 
g~uerate and utilise the data strttctures of the 
mode 1. 
The resettrch programme, of which the work descrihed 
here is a part, is sti\].l in the early stages of 
implementation. However, a m~aber of implementation 
decisions have already been made which give some 
IndieatJeu of what the final system will look like. 
Both the dialogue and intentional structures are to 
be repre~;ented using a frame-based language. The 
frames will be connected in a network. The 
instant\]at\]on and interconnect\]on of the frames will 
be the job of a general control algorithm, while the 
filling of many of the slots in the various frames 
w:i\]i be demo's driven, That is, associated with each 
slot will be a function that is activated when data 
is required for the slot, such as when the frame 
containing the slot is instant\]areal. Limited use 
will be made of the inheritance mechanism of the 
f~:ame system. Inheritance will be main\].y used for 
the inheritance of foeus- space information. The 
feature of frames that will he most utilised is that 
of demon-driven slot filling, 
This paper r pc~ c~seaic t; ' c¢ <mr , :~ ~-~i ~J 
the GFII) prejecc, suppertctt In pro'!, by ik~,: ~ ?/. i! 
programme of th~ CE(\] (ref. I~/84 A\]r %'/); 
54 ) 

REFERENCES 

Barwise, J., & Perry, d, (1983). Situations and 
attitudes Cambridge, HA: Bradford/MIT Press 

Burton, D. (1981). Analysing spoken discourse. In 
M° Coulthard , & M. Montgomery (Eds.), Studies in 
d i~s!Lour_s_ee ~£W~l~i ~. London: Routledge & Kegan 
Paul, 

Christie, B., Egan, O., Ferrari, C., Card\]net, g., 
'larper, J., Reilly, R., & Sheehy, N. (~995). 
D~\]~fl.verab l__o: ......... \]. ..... o f~_._. L~S__PRI'\[_..I~.rqj.e$:; t 4 ._. !~ 2 \]..;. A!~ 
classification g t~_e~.m, gducat:I onal Research 
Centre, Dubl.ln. 

Egan, O., Ferrari, G., Harper, J., Prodanof, I., 
Reilly, R., Sebastian\], F., & Sheehy, N. (1.987). 
D~\]~fl.verab l__o: ......... \]. ..... o f~_._. L~S__PRI'\[_..I~.rqj.e$:; t 4 ._. !~ 2 \]..;. A!~ 
Educational Research Centre, Dub)in. 


Egan, 0,, Fortes;t, M-A., Gardiner, M., Re:lily, i<., s 
Sheehy, S. (\].986).))2~:t£v.e~,::Ll.,J~e_.._;1__. _q_l g$!~!;! 
Educational Research Centre, Dt~b\].i~l. 

Egan, O., Harper, J., Harris, J., & Rci\].ly, \[~. 
.Diaj,.~g_ue.. studies main ~5~_~, ~:dlh"~tC'JOtl: ~, \] 
Research Centre, Dublin 

Grosz, B. (1977). ~tAo=_K£.p!!L'c_'_s.e~!~:~i:J~o_dL.s'~t_3cl2A~._e_~@ij_j:9~'U~ 
in dialoGtLq understanding. Unpul) \[\]shed PhD 
thesis, University of Cal!forn~a, i~erkeley. 

Grosz, B. J., & SI.dner, C. I,. Sideer (\].986). 
Att:ention, intentions, and the stltlcttlre e\] 
discourse struct:ure. CottqiL~7.gfl~%9n_a\]:oL,~.p.gl~i.~Lt~i!:~:~, 
12, 175-204. 

Reilly R. (in press). An ATN-based g~ammar for the 
structural ana\].ysJs of dialogue. \]i:! N. E. 
Sharkey (Ed. ) , ~._o!l~l~!!g__3_'.£~g~!_!~J.~_U).',~.. a!! 6L4!!'2~ I 
yeview o gep~gn_i_tii_v~L.~\]£.iero~ie. Nerweod, NJ: Ai:~ex. 

Sidner, C. L. 1979.
disco u~'s~ A, Technical R~:pert 537, M',Ti' i, ~ :i i ~ 
!nte\].\]igeoce Laboratory, Cnmbc~di,~ ~ t,I~. 
