COLLABORATION ON REFERENCE TO OBJECTS 
THAT ARE NOT MUTUALLY KNOWN 
Philip G. Edmonds 
Department of Computer Science, University of Toronto 
Toronto, Ontario, Canada M5S 1A4 
pedmonds@cs, toronto, edu 
ABSTRACT 
In conversation, a person sometimes has to refer to 
an object that is not previously known to the other 
participant. We present a phm-based model of how 
agents collaborate on reference of this sort. In making 
a reference, an agent uses the most salient attributes 
of the referent. In understanding a reference, an agent 
determines his confidence in its adequacy as a means 
of identifying the referent. To collaborate, the agents 
use judgment, suggestion, and elaboration moves to 
refashion an inadequate referring expression. 
1 INTRODUCTION 
In conversation, a person sometimes has to refer to 
an object that is not previously known to the other 
participant. One particular situation in which this 
arises is in giving directions. For example: 
Example 1 
1 A: Go straight ahead until you get to a funny- 
looking building. 
The hearer has to understand the reference well 
enough that when he later reaches the building, he 
will recognize it as the intended referent. 
A reference of this sort is often achieved through a 
collaboration between the conversants. In such cases, 
the speaker has the goal of having the hearer know how 
to identify an object. The speaker attempts to achieve 
this goal by building a description of the object that 
she believes will give the hearer the ability to identify 
it when it is possible to do so. The hearer needs to 
be confident that the description will be adequate as a 
means of identifying the referent, but because of the 
inevitable differences in beliefs about the world, he 
might not be. When the hearer is not confident, the 
speaker and hearer collaborate to make a new referring 
expression that the bearer believes is adequate. This 
can be seen in the following portion of a telephone 
conversation recorded by Psathas (1991, p. 196). 
Example 2 
1 A: Yajust stay on 2A, until ya get to Lowell Street. 
2 B: Is it marked? 
3 A: Yeah, I think there's a street sign there, it's an 
intersection with lights. 
4 B: Okay. 
In this dialogue, speaker B is not confident that he will 
be able to identify the intersection at Lowell Street, 
and so suggests that the intersection might be marked. 
Speaker A replies with an elaboration of the initial 
expression, and B finds that he is now confident, and 
so accepts the reference. 
This type of reference is different from the type that 
has been studied traditionally by researchers who have 
usually assumed that the agents have mutual knowl- 
edge of the referent (Appelt, 1985a; Appelt and Kro- 
nfeld, 1987; Clark and Wilkes-Gibbs, 1986; Heeman 
and Hirst, 1992; Searle, 1969), are copreseut with the 
referent (Heeman and Hirst, 1992; Cohen, 1981), or 
have the referent in their focus of attention (Reiter and 
Dale, 1992). In these theories, the speaker has the 
intention that the hearer either know the referent or 
identijy it immediately. 
Although the type of reference that we wish to 
model does not rely oll these assumptions, we can nev- 
ertheless draw from these theories. Thus, we base our 
model on the work of Chtrk and Wilkes-Gibbs (1986), 
and Heeman and Hirst (1992) who both modeled (the 
first psychologically, and the second computationally) 
how people collaborate on reference to objects for 
which they have mutual knowledge. We will briefly 
discuss these models, before we describe our own. 
2 COLLABORATION ON REFERENCE 
In their t'nndamental experiment, Clark and Wilkes- 
Gibbs (1986) demonstrated that conversants use a set 
of inherently collaborative procedures to establish the 
mutual belief that the hearer has understood a refer- 
ence. In the experiment, two subjects were each given 
a set of hard-to-describe tangram figures that were 
kept hidden from the other. One subject was required 
to get the other subject to rearrange his set to match the 
ordering of her set, and to do so through conversation 
alone. Thus, the two subjects were obliged to collab- 
orate on constructing descriptions of the figm'es that 
would allow them to be unambiguously identified; for 
example, the one that looks like an angel with a stick. 
Clark and Wilkes-Gibbs developed the following 
process model to explain their findings. To initiate 
the process, speaker A presents an initial version of a 
referring expression on which speaker B passes jndg- 
ment. 13 can either accept it, reject it, or postpone 
his decision until later. If B rejects o1" postpones, then 
1118 
the expression must be refashioned by either A or 13. 
Refashionings are accomplished in three main ways: 
repairing the expression by correcting speech en'ors, 
expanding the expression by adding more qualifica- 
tions, or replacing part or all of the expression with 
new qualifications. Each judgment/refashioning pair 
operates on the current referring expression, replacing 
it with a new one. This process continues until the 
expression, kept in the participants' common ground, 
is mutually accepted. 
This excerpt from Clark and Wilkes-Gibbs's data 
illustrates rejection (line 2), replacement (line 2), and 
acceptance (lines 3 and 4): 
Example 3 
1 A: Okay, and the next one is tile person that looks 
like they're carrying something and it's stick- 
ing out to the left. It looks like a hat that's 
upside down. 
2 B: The guy that's pointing to tile left again? 
3 A: Yeah, poiuting to the left, that's it\[ /laughs/ 
4 B: Okay. 
Heernan and Hirst (1992) rendered Clark and 
Wilkes-Gibbs's model computationally by casting it 
into the planning paradigm. Their model covers both 
the initiator of areferring action, and the recipie/~t wbo 
tries to understand the reference. In this moctel, the 
initiator has the goal of having tire recipient identify 
the referent, and so constructs a referring plan given 
a set of beliefs about what the recipient believes. The 
result of the initiator's plan is a set of surface speech 
actions, and hearing only these actions, the recipient 
tries to infer a plan in order to uuderstand the ref- 
erence. Thus, referring expressions are represented 
as plan derivations, and an unsuccessful referring ex- 
pression is an invalid plan in whose repair the agents 
collaborate. 
An agent can infer a plan even if it is invalid in that 
agent's view (Pollack, 1990). The evahmtion process 
attempts to find an instantiation of the variables such 
that all of the constraints are satisfied and the mental 
actions executable with respect to the hearer's beliefs 
about the speaker's beliefs. 
If tile recipient linds the initial referring expression 
phm invalid, then the agents will collaborate in its 
repair. Heeman and Hirst used plan repair techniques 
to refashion an expression, and used discourse phms, 
or recta-plans, to communicate the changes to it. Thus, 
a collaborative dialogue is modeled in terms of the 
evolution of tile referring plan. 
First, an agent lnust comnmnicatc that she has 
not understood a phm. Depending on how the re- 
ferring plan coustrains tile choice of referent, she 
constructs an instance of either reject-plan or 
postpone-plan, whose resulting surface speech ac- 
tions arc s-rojecl; and s-postpone respectively. 
Next, one agent or tile other must refashion the re- 
ferring expression plan in the context of the judg- 
ment by either replacing some of its actions (by using 
replace-plan) or by adding new actions to it (by 
using expand-plan). Tile result of both plans is the 
surface speech action a-actions. 
Because the model can play the role of both the 
initiator and the recipient, and because it can perform 
both plan construction and inference, two copies of 
the model can converse with one another, acting al- 
ternately as speaker and hearer. Acting as hearer, oue 
copy of the system performs plan in ference on each set 
of surface speech actions that it observes, and updates 
the state of the collaboration. It then switches roles 
to become the speakel, and looks lot a goal to adopt, 
and constructs a plan that achieves it. After respond- 
ing with the surface actions of the phm, it updates the 
state of the collaboration, presupposing that the other 
copy will accept the plan. The system repeats the pro- 
cess until it can lind no more goals to adopt, at which 
time it switches back to being the hearer and waits for 
a response from the other copy. 
3 CONFIDENCE AND SALIENCE 
A crucial assumption of Clark and Wilkes-Gibbs's 
work- -and of l~leeman and Hirst's model--is that the 
recipient of the initial referring expression already 
has some knowledge of ttle referent in question. In 
Clark and Wilkes-Gibbs's experiments, for: example, 
it is one of the tangram figures. In other words, tile 
hearer can understand a referring expression if its con- 
tent uniquely describes an object that he knows about. 
Obviously, an agent cannot use this criterion to under- 
stand the reference to the building in Example l---he 
has never heard of the buildiug before. What criteria, 
then, does he base his understanding on? 
The basis of our model is that the hearer can accept 
a referring expression plan if (1) the plan contains a 
description that is use./M for making an identification 
plan that tile hearer can execute to identify the referent, 
and (2) the hearer is confident that the identification 
plan is adequate. 
The first condition, originally described by Ap- 
pclt (1985b), is important because the success of the 
refen'ing action depends on the hearer formulating a 
uselifl identilication plan. We take the referring ex- 
pression plan itself to be the identification plan. The 
mental actions in the plan will encode only useful de- 
scriptions. For the second condition to hold, the hearer 
must believe that the identification plan is good enough 
to tmiquely identify the referent when it becomes visi- 
ble. This inw)lves giving enough information by using 
the most salient atributes of the referent. 
hl otrr model, each agerlt associates a numeric con- 
fidence value with each of tile attributes in the relSr- 
ring expression, and by composing these, computes a 
1119 
level of confidence in the adequacy of the complete 
referring expression plan that can be interpreted as 
ranging from low confidence to high confidence. The 
present composition function is simple addition, but 
one could envision more complex systems to com- 
pute confidence, such as an algebra of confidence or a 
non-numeric system. If the overall confidence value 
exceeds some set value, the agent's confidence thresh- 
old, then the agent believes the plan is adequate. That 
is, if the agent is the initiator, she believes that the 
other will be able to understand the reference; if the 
agent is the other, he believes that he has understood 
the reference. 
Now, the confidence value of each attribute is equiv- 
alent to its salience within the context of the referring 
expression. Salience, for our purposes in direction- 
giving, is primarily visual prominence, but can also 
involve identifiability, familiarity, and functional im- 
portance (Devlin, 1976; Lynch, 1960). One approach 
is to encode the salient properties in a static hierar- 
chy as Davis (1989), and Reiter and Dale (1992) have 
done. I But, ideally, salience should depend on the 
context sun-ounding the referent. For example, the 
height of a tall building would normally be salient, 
but not if it were surrounded by other tall buildings. 
This computation Would be quite complex, so we have 
adopted a middle ground between the simple context- 
independent approaches, and a full-blown contextual 
analysis. The middle ground involves taking the type 
of object into account when choosing attributes and 
landmarks that relate to it. For example, height and 
architectural style can be very salient features for de- 
scribing a building, but not for describing an intersec- 
tion, for which having a sign or traffic lights is impor- 
tant. This approach still allows us to encode salience 
in a hierarchy, but it is dependent on the referent. 
Table 1 shows an example of a simple salience hi- 
erarchy that an agent might have. The hierarchy is 
actually a set of partial orderings of attributes, repre- 
sented by lambda expressions, indexed by object type. 
In the table, the confidence value of using architectural 
style to describe a building is 4. The confidence value 
of a tall building is 3, and so this attribute is less 
salient than architectural style. The other rows (for 
describing intersections) follow similarly. 2 
Each agent has his own beliefs about salience. It is 
the difference in their beliefs that leads to the necessity 
for collaboration on reference. Ideally, the initiator 
should construct referring expressions with the recipi- 
ents' (believed) beliefs about salience in mind, but we 
have chosen to avoid this complexity by making the 
simplifying assumption that the initiator is an expert 
IThese models assmne that all agents have identical beliefs, 
which is clearly insufficient for modeling collaborative dialogue. 
2Given information about salience, we could construct such a 
hierarchy, but we do not presume that it would be easy to know 
what is salient. 
(and thus knows best what is salient). 
4 PLANS FOR REFERRING 
An agent uses his salience hierarchy for two related 
purposes: the first to determine what is salient in a par- 
ticular situation, and the second to determine the ade- 
quacy of a description. So, the hierarchy is accessed 
during both plan construction and plan inference. 
In plan construction, the hierarchy is used for con- 
structing initial referring expression plans, and for 
elaborating on inadequate plans by allowing an agent 
to choose the most salient properties of the referent 
first. The agent constructs an initial referring expres- 
sion plan in almost the same way as in Heeman and 
Hirst's system. Mental actions in the intermediate 
plans of a referring expression plan allow the speaker 
to choose the most salient attributes that have not yet 
been chosen, and constraints in the surface speech ac- 
tions make sure the speaker believes that each attribute 
is true. 3 Other mental actions in the intermediate plans 
add up the confidence values of the attributes, and a 
final constraint makes sum that the sum exceeds the 
agent's confidence threshold. So, for a referring plan 
to be valid, it must describe a unique object, and it 
must be adequate with respect to the speaker's beliefs. 
This means that attributes beyond those required \['or a 
unique description could be necessary. For example, 
to construct the reference to the building in Exam- 
ple l, the speaker consulted her salience hierarchy 
(in table 1) and determined that architectural style is 
salient. Hence, she described the building as Jknny- 
looking. This single attribute was enough to exceed 
her confidence threshold. 
During plan inference, the salience hierarchy is 
used when evaluating a recognized plan. The men- 
tal actions in the intermediate plans determine the 
confidence values of each attribute (from the heamr's 
salience hiemmhy), and add them up. The final con- 
straint in the plan makes sure that the hearer's confi- 
dence threshold is exceeded. Thus, judging the ade- 
quacy of a referring expression plan falls out of the 
regular plan evaluation process. If the final constraint 
does not hold, then the invalidity is noted so that the 
plan can be operated on appropriately by the discourse 
plans. 
For example, after recognizing the reference in Ex- 
ample l, the hearer evaluates the plan. Assuming he 
believes the salience information in table 1, he com- 
putes the confidence value of 4. If this value exceeds 
his confidence threshold, then he will accept the plan. 
If not, he will believe that them is an error at the 
constraint that checks his confidence threshold. 
~ln tleeman and Itirst's model, an attribute has to be mutually 
believed to be used. Here, mutual belief is not possible because 
the bearer has no knowledge of the referent, but mutual belief is 
an intended effect of using this plan. 
1120 
Table 1: A salience hierarchy for an agent. 
salient-attribute(4, building, lX.architectural-style(X,Style)). 
salient-attribute(3, building, AX.height(X,tall)). 
salient-attribute(3, intersection, AX.called(X,Name)). 
salient-attribute(2, intersection,sign, AX.AY.has(X,Y)). 
salient-attribute(2, intersection,traffic-lights, AX.~Y-has(X,Y)). 
5 SUGGESTION AND ELABORATION 
If the recipient is not confident in the adequacy of the 
plan, he uses an instance of postpone-plan to inlbrm 
Ihe initiator that he is not conlident ol' its adequacy, 
thereby causing tile initiator to raise her own confi- 
dence threshold. Now, allhough he cannot refashion 
the expression himself, he does have the ability to 
help the initiator by suggesting a good way to expand 
it; suggestion is a conversational move in which an 
agent suggests a new attribute that he deems would 
increase his confidence in the cxpressiou's adequacy 
if the expression were expanded to include the at- 
tribute. Coutinuing with the example, if the hearer 
were uol confident about the adequacy of the.fitnny- 
looking building, hc might suggest that the initiator 
use height (as well as architectural style), by asking Is 
it tall?. Frmn this suggestkm the initiator might ex- 
pand her expression to the tallftmny-looking building. 
So, in our sense, a suggestion is an illocutionary act 
of questioning; along with actually suggesting a way 
to expand a plan, the agent is asking whether or not 
tile referent has lhe suggested attribute. 
To decide what suggestion to make, tile agent uses 
an instance of suggest-expand-plan, which has a 
mental action in its decomposition that chooses the 
attribute that he believes is the most salient that has 
not been used ah'eady. Tile result of tile plan is the 
surface speech action, s-suggest, that communicates 
the suggestion. 
However, only the initiator of tile referring expres- 
sion can actually elaborate a referring expression, be- 
cause only she has tile knowledge to do so. Depend- 
ing on whether the hearer of the expression makes a 
suggestion or not, the initiator has two options when 
elaborating a plan. If no suggestion was made, then 
she can expand the plan according to her own beliefs 
about the referent's attributes and their salience. On 
the other hand, ira suggestion was made, she could in- 
stead attempt to expand the plan by aflinning or deny- 
ing the attribute suggested. \[f possible, she shouM 
use the suggestion to elaborate tile plan, lhus awfiding 
unwanted conversational implicature, but its use may 
not be enough to make tim plan adequate. 
Tile decomposition of expand-plan calls the plan 
constructor with the goal el'constructing a modif iers 
schema and with the suggested attribute as input-- 
in a sense, continuing the construction of tim initial 
referring plan. The plan constructor attempts to find a 
plan with the surface speech actions for the snggested 
attribute in its yield, but this might not be possible. In 
any case, the speaker constructs an expansion that will 
make the plan adequate according to her beliefs. 4 
The response to a suggestion depends, obviously, 
on whether or not the suggestion was used to ex- 
pand tile plan. The speaker can (1) affirm that the 
plan was expanded with the suggestion by using the 
s-affirm speech act; (2) aflirm that the suggestion 
was used, along with additional attributes that weren't 
suggested, by using s-affirm and s-actions; or 
(3) deny the suggestion with s-deny, and inform the 
other by s-act ions as lo how the plan was expanded. 
By repeatedly using the postponement, elaboration, 
aud suggestion moves, the two agents collaborate 
through discourse on refashioning the referring ex- 
pression until they mutually believe tlmt the recipient 
is conlidcnt that it is adequate. 
6 EXAMPLE 
We have implemented tile model in Prolog. Table 2 
shows the input/output of two copies of the system cue 
gaging in a simplified version of Example 2. Note that 
the system generates and tmderstands utterances in the 
form of descriptions of the smTace speech actions, not 
surface natural language forms. The existence of a 
parscr and a generator that can map between the two 
forms is assumed. Complete details of this example 
and of the model are given by Edmonds (I 993). 
7 CONCLUSION 
When an agent refers to a particular object that is 
not previously known to another agent, she has tile 
intentiou that tile agent be able to identify the object 
(when it is possible to do so) by means of the referring 
exln'ession. Because of the inevitable differences in 
their beliefs about the world -specifically about what 
is salient--the agents may have to collaborate to make 
the expression adequate. 
4Recall that she raisect her confidence thrcshoM as a result of 
the heater's postponement move, so now she must meet Ihe flew 
threshold. 
1121 
Table 2: Example of suggestion and elaboration. 
1 A: Gotothe LowellStreetintersection. 
s-goto(Entity) 
s-refer(Entity) 
s-attrib(Entity,~X.category(X,intersection)) 
s-attrib(Entity,~X-called(X,'Lowell Street')) 
2 B: Doesithave asign? 
s-postpone(pl) 
s-suggest(pl,\[s-attrib-rel(Entity,Entity2,AX.AY.has(X,Y)), 
s-refer(Entity2), 
s-attrib(Entity2,AX.category(X,sign))\]) 
3 A: Yes, itdoes, anditalsohastrafficlights. s-affirm(pl,\[s-attrib-rel(Entity,Entity2,AX.AY.has(X,Y)), 
s-refer(Entity2), 
s-attrib(Entity2,AX-category(X,sign))\]) 
s-actions(pl,\[s-attrib-rel(Entity,Entity3,AX.AY.has(X,Y)), 
s-refer(Entity3), 
s-attrib(Entity3,AX.category(X,traffic-lights))\]) 
4 B: Okay. 
s-accept(p123) 
We have implemented a computational plan-based 
model that accounts for the collaborative nature of ref- 
erence in the domain of interactive direction-giving. 
An agent constructs a referring expression plan by 
using the referent's most salient features. An agent 
understands a reference once he is confident in the ad- 
equacy of its (inferred) plan as a means of identifying 
the referent. To collaborate, the agents use judgment, 
suggestion, and elaboration moves to refashion the re- 
ferring expression until they mutually believe that the 
recipient has understood. 
Acknowledgments 
Our wm'k is supported by the University of'lbronto and by the Nat- 
nral Sciences and Engineering Research Council of Canada. We 
are gratefld to Peter Heeman, Graeme llirst, and Jeffrey Siskind 
for many helpful discussions. 
References 
Douglas E. Appelt (1985a). Planning English referring expres- 
sions. Artificial Intelligence, 26( 1 ): 1-33. 
Douglas E. Appelt (1985b). Some pragmatic issues in the plan- 
ning of definite and indefinite noun phrases. In Proceedings 
of the 23rd Annual Meeting of the Association Jbr Compu- 
tational Linguistics, pages 198-203. 
Douglas E. Appelt and Amichai Kmnfeld (1987). A computa- 
tional model o1' referring. In Proceedings of the Tenth Inter- 
national Joint Conference on Arli\[icial hltelligence (lJCA1- 
87), pages 640-647. 
Herbert H. Clark and Deanna Wilkes-Gibbs (1986). Refen'ing as 
a collaborative process. Cognition, 22:1-39. Reprinted in 
Cohen, E R., Morgan, J., and Pollack, M. E., editors. (1990). 
Intentions h~ Communication. MIT Press. pages 463-493. 
Philip R. Cohen (1981). The need for referent identification as a 
planned action. In Proceedh~gs of the Seventh haernational 
Joint Conference on Artificial Intelligence (HCAI-81 ), pages 
31-36. 
James Raymond Davis (1989). Back Seat Driver: Voice As- 
sisted Automobile Navigation. l'h.D, thesis, Massachusetts 
Institute of %chnology. 
Ann S. Devlin (1976). The "small town" cognitive map: Adjust- 
ing to a new environment. In G.T. Moore and R.G. Golledge, 
editors, Environmental Knowing: Theories, Research and 
Methods. Dowden, tlutchinson and Ross. 
Philip G. Edmonds (I 993). A computational model of collabora- 
tion on reference in direction-giving dialogues. M.Sc. the- 
sis, published as technical report CSRI-289, Department of 
Compnter Science, University of Toronto. 
Peter A. Heeman and Graeme Hirst (1992). Collaborating on 
refen'ing expressions. Technical Report TR 435, Computer 
Science Dept., Univ. of Rochester, Rochester, New York. 
Kevin Lynch (1960). The hnage of the City. M1T Press. 
Mm~ha Pollack (1990). Plans as complex mental attitudes. In 
R R. Cohen, J. Morgan, and M. E. Pollack, editors, Inten- 
tions in Communication, pages 77-103. MIT Press. 
George Psathas (1991 ). The structure of direction-giving in inter- 
action. In Deirdre Boden and Don I1. Zimmermau, editors, 
Talk and Social Structure, pages 195-216. Polity Press. 
Ehud Reiter and Robert Dale (1992). A fast algorithm for the 
generation of referring expressions. In Proceedings of the 
ldth International Conference on Computational Linguis- 
ticks I (COLING-92), pages 232-238. 
John. R. Searle (1969). SpeechActs: An Essay in the Philosophy 
q\[Language. Cambridge University Press. 
1122 
