Combining Deictic Gestures and Natural Language 
for Referent Identification 
Alfred Kobsa, Jfirgen Allgayer, Carola Reddig, Norbert Reithinger 
Dagmar Schmauks, Karin Harbusch, Wolfgang Wahlster 
SFB 314: AI - Knowledge-Based Systems 
University of Saarbrficken 
D-6600 Saarbriicken 11 
West Germany 
Abstract 
In virtually all current natural-language dialog systems, users 
can only refer to objects by using linguistic descriptions. How- 
ever, in human face-to-face conversation, participants fre= 
quently use various sorts of deictic gestures as well. In this 
paper, we will present the referent identification component of 
XTRA, a system for a natural-language access to expert sys- 
tems. XTRA allows the user to combine NL input together 
with pointing gestures on the terminal screen in order Io refer 
to objects on the display. Information about the location and 
type of this deietic gesture, as well as about the linguistic de- 
scription of the referred object, the case frame, and the dialog 
memory are utilized for identifying the object. The system is 
tolerant in respect to impreciseness of both the deictic and the 
natural language input. The user can thereby refer to objects 
more easily, avoid referential failures, and employ vague ev- 
eryday terms instead of precise technical notions. 
Keywords: Deixis, referent identification, NP analysis, parsing 
1. Introduction 
Various aspects of referent identification by hearers have been 
investigated in the last few years: It has been studied as a pro- 
cess of noun phrase resolution and attribute comparison (Lip- 
kis 1982), as a planned action (Cohen 1981, 84), as a pro- 
cess which depends on focus (Grosz 1981), context (Reichman 
1981), the mutual beliefs shared between speaker and hearer 
(Clark & Marshall 1.981) and the modality of linguistic com- 
munication (telephone vs. teletype, cf. Cohen 1984), and as a 
process which is prone to various sorts of conversational failure 
(Goodman 1985). In all of these studies, natural language is 
the only conversational medium. For identifying objects un- 
der discussion, the hearer can therefore only utilize the NL de- 
scriptions provided by the speaker, and information about the 
previous dialog and the task domain at hand. 
In face-to-face conversation, however, participants also fre- 
quently use extralinguistic means for referent identification, in 
particular, various sorts of deictic gestures (such as pointing at 
something by ones hand, finger, pencil, head or eyes). One 
This work is part of the SFB 314 Research Program on AI and Knowledge- 
Based Systems and has been supported by the German Science Founda- 
tion (DFG). The authors would like to thank J. Rink and W. Finkler for 
their help in preparing the manuscript, E-mail address of the authors: 
surname%sbsvamuucp~germany.csnet 
356 
may assume that this is done for simplifying and speeding up 
the identification process for both the hearer and the speaker, 
as well as avoiding referential failures. Certain technical in- 
novations in the last few years (e.g., high-resolution graphic 
displays, window systems, touch-sensitive screens, input via a 
pointing device such as the mouse or the light-pen) have made 
it possible for computational linguistics to also experiment with 
and study a certain class of these deictic gestures, namely, tactile 
gestures for identifying objects on a terminal screen. 
From an application-oriented perspective as well, such an abil- 
ity is certainly a desirable characteristic for natural language 
dialog systems. In current systems, referring to visual objects 
involves the user either to employ unambiguous labels displayed 
together with the objects (cf. Phillips 1985), or purely linguistic 
descriptions which sometimes become rather complex (e.g. the 
"bright pink flat piece of hippopotamus face shape piece of plas- 
tic" in Goodman 1985). In Woods et al. (1979), a combination 
of deictic and natural language input has already been envis- 
aged, but solely with restricted flexibility. Since an analyzer for 
pointing gestures is independent of a particular language, one 
might also consider transferring it to other NL dialog systems. 
In this paper, we will present the referent identification coin 
ponent of XTRA, a system for a natural-language access to 
expert systems currently under development at the University 
of Saarbrficken. In its present application domain, XTRA is 
intended to assist a user in filling out his/her annual withhold- 
ing tax adjustment form. The system will respond to termino- 
logical questions of the user, extract from the user's natm'al- 
language input the relevant data that is to be entered in the 
application form, and verbalize the inferences of the tax expert 
system. During the dialog, the relevant page of the application 
form is displayed on one window of the screen (for a simplified 
example, see Fig. 1; only the tax form is visible to the user). 
For referring to single regions in the form, to the entities stored 
therein, or to larger regions which contain embedded regions, 
the user can employ linguistic descriptions (which we will call 
descriptors), pointing gestures with a pointing device (mouse), 
or both. From now on, the noun 'delctic' will refer to the use 
of a pointing device, and the term 'deictlc expression' to the 
use of a descriptor plus a deictic (such as 'these deductibles' + 
deictic), or of a deictic alone. 
ill Bfihler's (1982) terminology, the kind of deixis used ill our 
situation is a dcmonsU'atio ad oculos. The objects on tile display 
are visually observable, upon which the user and the system 
share a common visual field. In Clark & Marshall's (1981) 
terms, they are in a situation of physical coprese,~ce. Theretbre, 
objects on tbe display need not be introduced by the use,, but 
can immediately be referred to by a descriptor, a deictic, or 
both. 
In many cases, however, neither kind of reference will be pre- 
cise. Referential expressions, on tile one hand, will often apply 
to more than one region in our form (as is the case when the user 
employs the term 'the deductibles' in order to refer to specific 
deductible sums such as dues for the membership in a profes- 
sional organization). Deictic gestures, on the other band, are 
also often imprecise in that they arc not aimed at the region in 
which the user actually wants to refer to. Reasons for this might 
be inattemiveness, an oversized pointing device, or the user's 
intention not to hide the data entered in the respective field. 
Another factor of uncertainty is the pars-pro-toto deictic. In 
this case, tile user points at an embedded region when actually 
intending to refer to a superordinated region. This is particu- 
larly the case when a form region is completely partitioned into 
a number of embedded sub-regions. 
Therefore, in our nrodel, we utilize several sources of informa- 
tion for identifying the region the user probably wants to refer 
to: the descriptor s/he uses, the location and the type of his/her 
pointing gesture, intrasentential context (case fi'ames), and the 
dialog context. The information from each of these sources 
alone may be ambiguous or imprecise. Combined, however, 
they ahnost always allow for a precise identification of a reter- 
ent. 
2. Knowledge sources of the system 
2.1. The tax form and the form hierarchy 
During the dialog with tile user, the system displays the relevant 
page of the income tax form on tile terminal screen. As is illus- 
trated ill Fig. 1, such a form consists of a number of rectangular 
regions, which may themselves contain embedded regions, etc. 
We will abbreviate these regions by R1, R2, etc. The user can 
apply deictic operations to all regions. 
For representing hierarchical relationships between regions, the 
system maintains an internal form hierarchy. Every region ill 
the form has a corresponding element in the form hierarchy. 
Hierarchical relationships between form elements can then be 
expressed by father-son relationships within the form hierarchy. 
There are two reasons for introducing such a hierarchical order: 
- Geometrical reasons: If region Rj is geometrically embedded 
in region Ri, then the element in the form hierarchy corre- 
sponding to Rj becomes a son of tile element corresponding 
to Ri. An example is given in Fig. 1 where regions R2 and 
R3 are geometrically embedded in R1. Hence, their con'e- 
sponding elements in the form hierarchy are subordinated to 
the element corresponding to R1. 
- Sema.ntic reasons: In many cases, there is a semantic coher- 
ence betwee.n regions in tile form not directly expressed by 
the geometrical hierarchy. For example, see regions R 15 and 
R16, and regions R33 and R34 in Fig. 1, which intuitively 
form units within the form for which no direct geometrical 
equivalents exist. Therefore, so-called abstract regions are 
introduced in the form hierarchy to which conceptually co- 
herent regions call be connected. These regions even need 
not be geometrically adjacent and can be subordinated to 
more than one abstract region. In Fig. 1, abstract regions 
are denoted by the symbol 'AR' (as e.g. AR48, the father 
of R15 and R16). It is ,lot surprising that abstract units in 
the form hierarchy are often directly related to higher level 
representational elements in the conceptual knowledge base 
of the systcnr (cf. section 2.3.). 
Moreover, we discern two types of bottom regions: Labd re- 
gions contain the ofticial inscriptions on the form (e.g. LR9 tbr 
'Professional Expenses'), value regions contain tile space for the 
user's data (e.g. VR28 for educational expenses). From now 
on, we will no longer distinguish between the tbrm as displayed 
on the screen and tile form hierarchy stored in the system. Since 
a close relationslfip between both structures exist, no problems 
will arise thereby. 
2.2. The pointing gestures 
Following Clark et al. (1983), we will call the region(s) at which 
the user pointed to the demonstratum, and tile region which 
s/he intended to refer to the referent. Three cases can then be 
discerned: 
a) The demonstratum is identical to the referent. 
b) The demonstratum is a descendant of the referent (pars-pro- 
toto deixis). In this case, the referent may be a geometrical 
or an abstract region. 
c) The demonstratum is geometrically adjacent to tile referent. 
This occurs when the user points below tile referent, to its 
right, etc. (e.g., by inattentiveness or because of not wanting 
to hide the data entered in the respective region). 
In most cases, obviously, the location ofa deictic does not iden 
tify its referent, but only restrains the set of possible referential 
candidates. Therefore, information about the pointing gesture 
usually has to be combined with information from other knowl- 
edge sources. 
Another observation was that mos! subjects use several types of 
pointing gestures differing in exactness. Their cboice seems to 
depend on tile size of the target region. The larger tile referent 
and the more sub-regions it contains, the vaguer is the point- 
ing gesture. Therefore, our system allows the user to choose 
among several degrees of accuracy in his/her deictic. The user's 
decision, in turn, is taken into account when the system has to 
choose between referential candidates differing in size or to the 
degree of cmbedment (cf. section 3.1.2.). 
2.3. The conceptual knowledgc base 
Ill our system, conceptual knowledge is represented by a frame- 
based language that shows a strong resemblance to Brachman's 
(1978) KL-ONE. The general part of tile representation con- 
tains concepts and attribute descriptions of concepts. Attribute 
descriptions mainly consist of roles and value restrictions fbr 
possible role fillers. Ill Fig. 1, concepts are depicted by ovals and 
roles by small circles (the figure has been somewhat simplified). 
For object concepts (as e.g. 'MEMBERSHIP FEE' and 'OR- 
GANIZATION'), attribute descriptions specify the properties 
of tile objects described by the concept. For action concepts (as 
e.g. 'PEIYSICAL TRANSFER', 'ADD' etc.), they specify the 
case frame. 
General concepts can be ordered in a concept hierarchy, allow- 
ing the attribute descriptions of concepts to be inherited fl'om the 
superordinated concepts. In Fig. 1, the bold arrows denote such 
superconcept relations. More specific concepts can be defined 
by introducing additional attribute descriptions or by fi~rther 
restraining the value restrictions of role fillers. It is possible for 
357 
ORGANIZATION 
\ \\ 
\,\, 
agent 
'\ ! i 
'\. ! i 
OF MONE5 
x) ~PRC NAL , • ~,-~x,. vt~.,~ GIIARITABLE \[ ~ ~FESSIO \ ~ 2tf ORGANIZATION~ ~,ORGANIZATIONj~)~EMB, FF~ 
Conceptual 
Knowledge i 
Base (~ ~6) 
ADD 1 
I ~ \ Lexicon 
, ~(R 14) i i = ' "--%,,(r~ 16) 
i "',(R 361 
F" 
! i ~ (AR 51) i 
i 
i 
(AR 51) 
i (R 34) 
FmJctional- 
Semantic 
Structm'es 
~R 9 / 
~RI3 .... ~'VR 2~$ 
/' /./. ~ '~"Alt'48"~ ~ F-R16 
..... ~'VR26 
\[b~R 16- - - -!~Vn29 
RZ/R ~0~'"~'a¢~ "*-An 49<~" vn 17 .... ~vR2~ 
F,°( 
"'~R 12~R 3, I .AR~ 1~33 ..... *VR43 
R31~R53 ~R34- ~VR44 
Form ~n 3 
~1 Deductibles 
~2 Professional Expensqs 
,13 Educational expenses R14 250.00 
,15 Professional organ_~_.Z, member, s~ ~A'~ 
'17 Bu., ...... I1~ LT/:~. 
i, ..p/l,j~ v.,, 
p: ~__ t 
,~ .... := Other Deductibles .;,\]....\] . ;':.~ 
,31 Ch,rltable organl~4.ation~ :;!.~;i: ": ':.. ":.. ! .-.. , • , 
Mcm~ 1~3~t 40'00 non.,,..o-. :..___j__~ ~°.°0 
R3 
80,08 
Hierarchy 
"Can I add my annual $15.00 ACLdues to these membership fees?" 
Fig. 1: The knowledge sources of the system 
358 
a concept to be subordinated to more than one superconcept, 
thus inheriting the properties of several superconcepts. 
Natural-language input of the user containing new facts rele- 
vant tot tax adjustment, as well as data entered directly into the 
form, causes slructures of the general part to be mdivMualized. 
Individualized concepts (depicted by ovals with lateral strokes 
in Fig. 1) and individualized attribute descriptions are thereby 
created. In Fig. 1, the individualized structures express the facts 
that the user spent $80 and $40 as professional organization and 
charitable organization membership fees, respectively. 
Concepts and roles can be linked to elements in the tbrm bier- 
archy if they conceptually correspond to a region in the form. 
\[n Fig. 1, tbr instance, the concept 'NUMBER' is associated 
with regions R16 and R34, amongst others, and the concept 
'PROI".ORGAN.MF, MB.FI';E' with region AR48. 
2.4 The flmctinnal-semantic structure 
Belbre individualizations of the conceptual knowledge base are 
created, the natural--language input of the user is first mapped 
onto individua{izations of the so:called timctional-semantic 
structure (FSS). The task of the FSS (cf. Allgaycr & Rcddig 
1986) is to express the syntactic and semantic relationships bc- 
lween the constituents of the input sentence. It is also repre- 
sented in a Kl,-()Nl'~-like scheme. Amongst other things, the 
word stem entries in the lexicon determine which parts of the 
FSS are to be individualized, l)uring this process, inlormation 
about the location and the type of the occuring pointing ges- 
tures is assigned to the nmm phrases to which flmy belong. Fig. 
1 shows part of the individualized leSS generated by the input 
sentence. 
The. I"SS forms the starting point tbr the referential analysis of 
tile naturalqanguage input, i.e. the mapping onto individual 
ized structures of the conceptual knowledge base. This task is 
perlbrmed by an interpreter using appropriate mapping rules. 
2.5. The dialog memory 
Our current provisional approach is to regard tile dialog mem- 
ory as a structured Iis~ containing individualizations of the 
concepts in the conceptual knowledge base. When a rcfi'r- 
ent is recognized as not having been lnc.ntioned before (be 
cause it is not contained in the clialog memory), the respec 
live concept is individualized, linked to the referent, and en- 
tered as the most relevant element of the dialog memory. In 
Fig. 1 we assume that regions tt_16, R34, AR48 and ARSI, 
amongst others, have been addressed betbre. Thus the con- 
cepts PROF.ORG.MI';MB.FEE, CItAR.ORC;.MEMB.FEE 
and NUMBER have been individualized and linked to these 
regions. 
3. Referent identification processes 
In a user's NI, input, a deictic can be used at any position where 
a noun phrase or a (locative) adverbial phrase is to be expected. 
From a syntactic point of view, a deictic can serve two functions: 
- it supplements a syntactically saturated description, i.e. takes 
the form of an additional attribute. 
- it replaces a syntactically obligatory constituent (e.g. the head 
of a noun phrase). 
The position ofa deictic may be before, within, or after a noun 
phrase. Syntactic vicinity is taken into account if an ambiguity 
occurs in embedded noun phrases. 
In the XTRA system, tbur sources of intbrmatlon are utilized 
in order to identify the referent of a deictic expression: The lo- 
cation of tile user's pointing gesture, the descriptor s/he uses, 
case frame restrictions, and the contents of tbe dialog mem- 
ory. The three former sources can be found in the lunctional- 
semantic structure, the latter source in the individualized part of 
the conceptual knowledge base. RetErent identification, then, 
is perlormed in the following order: 
a) Generation of potential referents by the most appropriate 
knowledge source. Source--specific partial plausibility vat- 
ues are thereby assigned m each generated candidate. Only 
deictic, descriptor and case. ti'ame are considered in fills step, 
lhe dialog memory is only used in step (b). 
b) Re-ewduation of each candidate by consecutively consider- 
ing the inlbrmation from all other knowledge sources. 
c) ()vcrall evaluation by considering all partial plausibility as- 
signments; sel.ection of' the candidate with the highest plau- 
sibilily factor. 
In the tbllowing section we will describe how tile most appro 
priate knowledge source for refi:rent generation is selected and 
how referential candidates are generated. Since we arc pattie 
nlarly concerned with referent identification through pointing 
gestm'es, we will only descrihe the referem generation strategy 
of the deixis analyzer (also of. Allgayer I986). For general 
ing candidates through descriptors and ease flames, we use Ihe 
"classical" way leading from the lcxicon via the FSS oww to in- 
dividualized concepts in the conceptual knowledge base and to 
the form hierarchy. In section 3.2., we then describe how lllc 
deixis analyzer rc-evahmle.s candidales supplied by descriptor 
and case fi'ame analysis, and how candidates generated by the 
deixis analyzer are re evaluated by considering the intormation 
of all other knowledge sources. The example depicted m Fig. 
1, to which we constantly refi:r to in the upcoming section, was 
chosen to demonstrate that, in many c'ases, all, or nearly all 
of these knowledge som'ccs are necessary to correctly identify a 
referent. 
3.1. Generating potential referents 
3.1.1. Deciding for the most appropriate knowledge 
source 
In orde.r to restrain tile computational complexity of tile identi- 
fication process, it nlust be decided first whether referential can 
didates shouM be generated by analyzing the pointing gesture, 
the descriptor, or the case fi'anm of the user's input. To assure 
that only a small number of candidates nmst be re-evaluated 
in the subsequent steps, it is certainly advisable to choose the 
knowledge source which yields the smallest set of plausible can 
didates that still contains the refe.rent. The evaluation of each 
knowledge source is performed according to the following cri- 
teria: 
- Deixis: The quality of a u,mr's deictic for candidate genera- 
tion is inversely proportional to the number of regions con- 
lained in the demonstratum and the number of ancestors of 
the demonstratum. A deictic to R3 in Fig. 1, for instance, 
will yield less candidates th.an a deictic to R34. 
- Descriptor: Ifa descriptor does not contain a head, it cannot 
be used for candidate generation. Otherwise, its quality is 
inversely proportional to tim number of subconcepts of its 
conceptual representation and tile number of regions linked 
to these concepts. E.g., tbr the representation in Fig. 1, tile 
descriptor 'number' will yield by far more candidates than 
the descriptor 'membership fi~e'. 
359 
- Case frame: The quality of a case restriction for referent gen- 
eration depends on the quality of the selection restriction con- 
cept of the corresponding role in the conceptual knowledge 
base. This quality can be computed in the previous manner 
mentioned. In Fig. 1, the selection restrictions for the ADD 
concept do not seem to be profitable for candidate generation. 
3.1.2. Generating candidates by analyzing the user's 
pointing gesture 
As was mentioned above, our system allows for the use of sev- 
eral types of deictic gestures differing in precision. A so-called 
deictic fieM is associated with each type of pointing gesture, its 
size corresponding to the degree of exactness of the deictic. An 
example for three different types of pointing gestures is given 
in Fig. 2. 
Educational expenses 
Fig.2: Three types of pointing gestures 
A deictic fiekt may either be completely contained in a basic re- 
gion (as is the case for deictic 1 in Fig. 2) or overlap two or more 
basic regions (deicties 2 and 3, respectively). All basic regions 
that are overlapped by a deictic field serve as first referential 
candidates in our system. The ratio of that part of a region 
covered by a deictic field in relation to the size of the total re- 
gion yields the plausibility value for the region. Deietic 3, for 
instance, generates R18, R16, R17 and R15 as first candidates, 
in order of descending plausibility (cf. Allgayer 1986). 
In a second step, the system accounts for the possibility ofpars- 
pro-toto deixis. All regions semantically or geometrically su- 
perordinated to any of the current candidates are also considered 
as candidates. The plausibility assignment of a superordinated 
region depends on its type, the plausibility of its candidate sub- 
regions, and the type of pointing gesture employed by the user 
(the vaguer the pointing gesture, the higher is the plausibility 
of the superordinated regions). In Fig. 2, regions AR49 and 
AR48 would be added in the case ofdeictie 3, both with higher 
plausibility than any of the first candidates. This upward prop,~- 
gation through the hierarchy can be applied iteratively, yielding 
even more candidates (the valuation function smoothly declines 
thereby to render high-level regions less plausible). The result- 
ing set of candidates has to be re-evaluated by the processes 
described below. 
3.2. Re-evaluatlng the set of candidates 
3.2.1. Re-evaluation by analysis of the pointing gesture 
If the optimization process of section 3.1.1. decided that de- 
scriptor or case frame analysis were the most appropriate knowl- 
edge sources for candidate generation, analysis of the deictie is 
employed in our system for re-evaluating the candidates sup- 
plied by these components. This evaluation process is rather 
similar to candidate generation described above. For example, 
360 
see Fig. 1 (we assume that the delctic in this example is the same 
as deictic 3 in Fig. 2): If the desciptor analyzer generated AR48, 
AR51, R16 and R34 as potential referents (since the descriptor 
was 'membership fee', see below), the deixis component would 
assign high plausibility values to the former, and very low ones 
to the latter. 
3.2.2. Re-evaluation by descriptor analysis 
This process determines to what extent the conceptual represen- 
tation of the descriptor applies to the current candidates. Each 
candidate is tested as to whether the representation of the de- 
scriptor, a subconcept of this representation, or (if existent) the 
restriction concept of the value slot of one of these concepts 
is linked to the candidate. The more concepts in between the 
representation of the descriptor and the linked subeoncept, the 
lower the new partial plausibility assignment. Let us assume for 
our example in Fig. 1 that the deixis analyzer, in order of de- 
creasing plausibility, has generated regions AR49, AR48, R18, 
R16, R17 and R15 as potential referents. If the descriptor 
is 'these membership fees', the descriptor analysis will prel~r 
ARt8 and R16, since a subconcept of the representation of this 
descriptor is linked to AR48, and the restrictiou concept of its 
value slot is linked to R16. 
3.2.3. Re-evaluation by case frame analysis 
This process determines to what extent the selection restriction 
concept of the respective slot in the conceptual representation 
of the verb applies to the referential candidates under investiga- 
tion. This evaluation process is performed almost identically to 
that of the descriptor. In our example, high plausibility would 
be attributed to regions R16 and R18, since the concept NUM- 
BER (the restriction concept of the relevant slot of the concept 
ADD) is linked to these regions. 
3.2.4. Restriction by dialog memory 
This process determines whether a referent has recently been 
mentioned by checking whether or not an individualized con- 
cept connected with it is contained in the dialog memory. The 
better the position of such an individualized concept in the list, 
the better the plausibility of the candidate. In Fig. 1, we assume 
that both the professional and the charitable society member- 
ships and their values have been addressed just recently. There- 
fore, in our example, high plausibility values are assigned to 
regions R16 and AR48. The overall evaluation will then select 
R16, it having obtained the highest total plausibility. 
4. Discussion 
Our system demonstrates that spatial deixis is a valuable source 
of information for identifying referents which also can be in- 
vestigated and utilized in natural language dialog systems with 
pictoral display. Three reasons sum up the advantages of us- 
ing pointing gestures: They save the speaker the generation, 
and the hearer the analysis of complex referential descriptions 
and thus simplify the natural-language dialog; they often allow 
for reference in situations in which linguistic reference is sim- 
ply not possible (think of referring to one out of a dozen similar 
objects); and they permit the speaker to be vague, imprecise, or 
ambiguous, and to use everyday terms instead of precise tech- 
nical terms unknown to him/her. 
In natural-language dialog systems, deixis analysis can be com- 
bined well with standard methods for referent identification. 
Sonre of the identification processes (e.g., tests with case frame, 
descriptor and dialog menmry) are rather similar to the classi- 
cal methods used ibr anaphora and ellipsis resolntion. Others, 
such as the generation and evaluation of candidates by the deixis 
analyzer, are typical with respect to this particular kind of con- 
versational medium. 
It should be pointed out, however, that out' environment for 
spatial deixis is, in several ways, somewhat simpler than those 
occurring in person-to-person dialogs (cf. Schmauks 1986). 
The deictic fieM is only two.dimensional, and the objects that 
carl be pointed at are clem'ly separated from each other. Com- 
pared to real-life situations, the number of possible referents is 
relatively small. "Left" and "right" xrman the same thing for 
the user and the system (which is not the case, e.g., in face-to- 
lace conversation), iIowever, this relative simplicity neeci not 
be a rh'awback. Instead, one might regard our environment as 
a study in vitro, eliminating a number of uncertainty t~tctors so 
that tile essential characteristics of spatial deixis become more 
salient. 
Another question is whether the deictic behavior ofsul~jects who 
use a poiming device is the same as that of subjects who touch 
the display with their fingers (and thus, whether deixis via a 
pointing device is a valid sinmlation of tactile deixis). One 
might argue, e.g., that people point more precisely with a mouse 
than with their lingers, or vice versa. We are currently conduct- 
ing an inibnnal experiment to answer these questions. In any 
case, only the propagation functions are perhaps all~:cted t0y a 
change of tile deictic medium, whcreas the referent identitica- 
lion processes will remain tile same. 
Attempts are currently being made to also integrate visual ancl 
conceptual salience in our model (cf. Clark et al. 1983). When a 
pointing gesture is ambiguotxs, it appears that regions set off by 
boM fi'ame or coloring, as well as regions containing important 
data tbr tile task domain are preferred. We expect this pref- 
erence to be laken into account in the evahmtion processes of 
tile deixis analyzer. Another possible extension which wc wouM 
like to invesdgatc is in replacing the strategy described in section 
3.1.1. by a certain form of incremental referent identification. 
There is strong empirical evidence (e.g. Goodman 1985) that 
people begin with referent identification immediately alter re- 
ceiving initial information about it, instead of waiting tmtil the 
speaker's reti~rential act is terminated. Since all components de- 
scribed above are strictly separated, it appears basically possi- 
ble to also use them in an incrmnental identification process. In 
one-processor systems, however, great care must be taken that 
the knowledge source first adressed does not block the system 
by generating too many candidates. Therefore, some process 
controlling will be necessary, either by ressource limitation or 
by taking into account the heuristics listed in section 3.1.1. 

References

Allgayer, ,J. (1986): Eine Oraphikkomponente zur httegra- 
lion von Zeigehandlungen in natfirlichsprachliche KI- 
Systeme. 16. GI-Jahrestagung, Berlin, FRG (in print). 

Allgayer, J. und C. Reddig (1986): Systemkonzei)tion zur Ver- 
arbeitung kombinierter sprachlicher nnd gestischer Refer- 
entenbeschreibungen. SFB 314, Dept. of Computer Sci 
ence, University of SaarbrQcken, FR Germany. 

Brachman, R. J. (1978): A Structural Paradigm for Repre- 
senting Knowledge. Report No. 3605, Bolt, Beranck attd 
Newman Inc., Cambridge, MA. 

Biihlcr, K. (1982): The Deictic FieM of I.,anguage and De- 
ictic Words. Abridged translation of K. B/ihler (1934): 
Sprachtheorie, part 2, chapters 7 and 8. In: R. J. Jarvella 
and W. Klein, eds.: Speech, Place, and Action. (\]hich-- 
ester etc.: Wiley. 

Clark, II. I1. and C. R. Marshall (1981): Definite Refi:rence 
and Mutual Knowledge. hi: A. K. Joshi, 13. l,. Webber 
and I. A. Sag, eds.: Elements of Discourse Unde.rstand 
ing. Cambridge: Cambridge Univ. Press. 

Clark, lI. 11., P,. Schreuder and S. Buttrick (1983): Common 
Ground and the Understanding of Demonstrative Refer 
ence..Journal of Verbal Le.arning and Verbal Behavior 
22, 245-.258. 

Cohen, P. R. (1981): The Need fox" Referent Identification as 
a l'lanned Action. Proceedings of the 7th hrternational 
Joint Conference on Artificial Intelligence, Vancouver, 
Cda., 31-36. 

Cohen, P. R. (1984): The Pragmatics of Referring and the 
Modality of Communication. Computational Linguistics 
10, 97-146. 

Goodman, B. A. (1985): Repairing Reference Identification 
Failures by Relaxation. Proceedings of the 23rd ACL, 
Meeting, Chicago, 1I,, 204-217. 

Grosz, B. J. (11.)81): Focusing and Description in Natural Lan- 
guage Dialogues. In: A. K. Joshi, B. L. Webber and 
\[. A. Sag, eds.: Elenrents of Discourse Understanding. 
Cambridge: (\]ambridge Univ. Press. 

Phillips, B., M. ,J. Freiling, ,J. U. Alexander, S. L. Messick, 
S. Rehfnss and S. Nieholl (1985): An Eclectic Approach 
to Building Natural Language Interfaces. Proceedings of 
tile 23rd ACL, Meeting, Chicago, IL, 254-261. 

Lipkis, Thomas (1982): A KL-ONE Classifier. Proceedings 
of the 1981 KL-ONE Workshop. Report No. 4842, Bolt, 
Beranek and Newman Inc., Cambridge, MA, 128-145. 

Reichman, R. (1981): Plain Speaking: A Theory and Gram- 
mar of Spontaneous Discourse. Report No. 4681, Bolt, 
Beranek and Newman Inc., Cambridge, MA. 

Schmanks, D. (1986) : Formulardeixis und ihre Simulation 
auf dem Bildschirm. Ein Uberblick aus linguistischer 
Sicht. Memo No. 4, Sonderforschungsbereich 314, Dept. 
of Cmnputer .qcience, University of Saarbrficken, FRG. 

Woods, W. A., R.J. Brachman, R. J. Bobrow, R. R. Cohen 
and J. W. Klovstad (1979): Research in Natural I,an- 
guage Understanding: Annual Report. TR 4274, Bolt, 
Beranek & Newman, Cambridge, MA. 
