Proceedings of the Fourth International Natural Language Generation Conference, pages 73–80,
Sydney, July 2006. c©2006 Association for Computational Linguistics
Group-based Generation of Referring Expressions
Funakoshi Kotaro
∗
Watanabe Satoru
†
Department of Computer Science, Tokyo Institute of Technology
Tokyo Meguro
ˆ
Ookayama 2-12-1, 152-8552, Japan
take@cl.cs.titech.ac.jp
Tokunaga Takenobu
Abstract
Past work of generating referring expres-
sions mainly utilized attributes of objects
and binary relations between objects in or-
der to distinguish the target object from
others. However, such an approach does
not work well when there is no distinc-
tive attribute among objects. To over-
come this limitation, this paper proposes
a novel generation method utilizing per-
ceptual groups of objects and n-ary re-
lations among them. The evaluation us-
ing 18 subjects showed that the proposed
method could effectively generate proper
referring expressions.
1 Introduction
In the last two decades, many researchers have
studied the generation of referring expressions to
enable computers to communicate with humans
about objects in the world.
In order to refer to an intended object (the tar-
get) among others (distractors), most past work
(Appelt, 1985; Dale and Haddock, 1991; Dale,
1992; Dale and Reiter, 1995; Heeman and Hirst,
1995; Horacek, 1997; Krahmer and Theune, 2002;
van Deemter, 2002; Krahmer et al., 2003) utilized
attributes of the target and binary relations be-
tween the target and distractors. Therefore, these
methods cannot generate proper referring expres-
sions in situations where there is no significant
surface difference between the target and distrac-
tors, and no binary relation is useful to distinguish
the target. Here, a proper referring expression
∗
Currently at Honda Research Institute Japan Co., Ltd.
†
Currently at Hitachi, Ltd.
means a concise and natural linguistic expression
enabling hearers to identify the target.
Forexample, consider indicating object b to per-
son P in the situation of Figure 1. Note that la-
bels a, b and c are assigned for explanation to the
readers, and person P does not share these labels
with the speaker. Because object b is not distin-
guishable from objects a or c by means of their
appearance, one would try to use a binary relation
between object b and the table, i.e., “a ball to the
right of the table”. However, “to the right of” is
not a discriminatory relation, for objects a and c
are also located to the right of the table. Using a
and c as a reference object instead of the table does
not make sense, since a and c cannot be uniquely
identified because of the same reason that b cannot
be identified. Such situations have drawn less at-
tention (Stone, 2000), but can frequently occur in
some domains such as object arrangement (Tanaka
et al., 2004).
P
a
b
c
Table
Figure 1: An example of problematic situations
Inthe situation ofFigure 1, the speaker can indi-
cate object b to person P with a simple expression
“the front ball”. In order to generate such an ex-
pression, one must be able to recognize the salient
perceptual group of the objects and use the n-ary
relative relations in the group.
73
To overcome the problem described above, Fu-
nakoshi et al. (2004) proposed a method of gen-
erating Japanese referring expressions that utilizes
n-ary relations among members of a group. They,
however, dealt with the limited situations where
only homogeneous objects are randomly arranged
(see Figure 2). Thus, their method could han-
dle only spatial n-ary relation, and could not han-
dle attributes and binary relations between objects
which have been the main concern of the past re-
search.
In this paper, we extend the generation method
proposed by (Funakoshi et al., 2004) so as to han-
dle object attributes and binary relations between
objects as well. In what follows, Section 2 shows
an extension of the SOG representation that was
proposed in (Funakoshi et al., 2004). Our new
method will be described in Section 3 and eval-
uated in Section 4. Finally we conclude the paper
in Section 5.
2 SOG representation
Funakoshi et al. (2004) proposed an intermedi-
ate representation between a referring expression
and the situation that is referred to by the expres-
sion. The intermediate representation represents
a course of narrowing down to the target as a se-
quence of groups from the group of all objects to
the singleton group of the target object. Thus it is
called SOG (Sequence Of Groups).
The following example shows an expression de-
scribing the target x in Figure 2 with the cor-
responding SOG representation below it. Since
Japanese is a head-final language, the order of
groups in the SOG representation can be retained
in the linguistic expression.
hidari oku ni aru
(1)
mittu no tama no uti no
(2)
itiban migi no tama
(3)
(the rightmost ball
(3)
among the three balls
(2)
at the back left
(1)
)
SOG:[{a,b,c,d,e,f,x},{a,b,x},{x}],
where{a,b,c,d,e,f,x}denotes all objects in
the situation,{a,b,x}denotes the three objects
at the back left, and {x}denotes the target.
2.1 Extended SOG
As mentioned above, (Funakoshi et al., 2004) sup-
posed the limited situations where only homoge-
neous objects are randomly arranged, and consid-
ered only spatial subsumption relations between
consecutive groups. Therefore, relations between
P
a
b
e
f
c d
x
Figure 2: An example from (Funakoshi et al.,
2004)
groups are not explicitly denoted in the original
SOGs as shown below.
SOG: [G
0
,G
1
,...,G
n
]
G
i
: a group
In this paper, however, other types of relations
between groups are also considered. We propose
an extended SOG representation where types of
relations are explicitly denoted as shown below. In
the rest of this paper, we will refer to this extended
SOG representation by simply saying “SOG”.
SOG: [G
0
R
0
G
1
R
1
...G
i
R
i
...G
n
]
G
i
: a group
R
i
: a relation between G
i
and G
i+1
2.2 Relations between groups
R
i
, a relation between groups G
i
and G
i+1
,de-
notes a shift of attention from G
i
to G
i+1
with
a certain focused feature. The feature can be an
attribute of objects or a relation between objects.
There are two types of relations between groups:
intra-group relation and inter-group relation.
Intra-group relation When R
i
is an intra-group
relation, G
i
subsumes G
i+1
,thatis,G
i
⊃ G
i+1
.
Intra-group relations are further classified into the
following subcategories according to the feature
used to narrow down G
i
to G
i+1
. We denote these
subcategories with the following symbols.
space
−→ : spatial subsumption
type
−→ : the object type
shape
−→ : the shape of objects
color
−→ : the color of objects
size
−→ : the size of objects
With respect to this classification, (Funakoshi et
al., 2004) dealt with only the
space
−→ relation.
74
Inter-group relation When R
i
is an inter-group
relation, G
i
and G
i+1
are mutually exclusive, that
is, G
i
∩ G
i+1
= φ. An inter-group relation is a
spatial relation and denoted by symbol ⇒.
Example R
i
can be one of
space
−→ ,
type
−→,
shape
−→ ,
color
−→,
size
−→ and ⇒. We show a referring expres-
sion indicating object b1 and the corresponding
SOG in the situation of Figure 3. In the SOG,
{all} denotes the total set of objects in the situ-
ation. The indexed underlines denote correspon-
dence between SOG and linguistic expressions.
As shown in the figure, we allow objects being on
the other objects.
marui
(1)
futatu no tukue no uti no
(2)
hidari no
(3)
tukue no
(4)
ue no
(5)
tama
(6)
(the ball
(6)
on
(5)
the left
(3)
table
(4)
among the two
(2)
round
(1)
tables
(2)
)
SOG: [{all}
type
−→ {t1,t2,t3}
shape
−→
(1)
{t1,t2}
(2)
space
−→
(3)
{t1}
(4)
⇒
(5)
{b1}
(6)
]
b2
b1
b5
t3
t2
p1
t1
b3
b4
blue
black
red
Figure 3: An example situation
3 Generation
Our generation algorithm proposed in this section
consists of four steps: perceptual grouping, SOG
generation, surface realization and scoring. In the
rest of this section, we describe these four steps by
using Figure 3 as an example.
3.1 Step 1: Perceptual grouping
Our algorithm starts with identifying groups of
objects that are naturally recognized by humans.
We adopt Th´orisson’s perceptual grouping algo-
rithm (Th´orisson, 1994) for this purpose. Per-
ceptual grouping is performed with objects in the
situation with respect to each of the following
features: type, shape, color, size,andproxim-
ity. Three special features, total, singleton,and
closure are respectively used to recognize the to-
tal set of objects, groups containing each single
object, and objects bounded in perceptually sig-
nificant regions (table tops in the domain of this
paper). These three features are handled not by
Th`orisson’s algorithm but by individual proce-
dures.
Type is the most dominant feature because hu-
mans rarely recognize objects of different types as
a group. Thus, first we group objects with respect
to types, and then group objects of the same type
with respect to other features (except for total).
Although we adopt Th´orisson’s grouping algo-
rithm, we use different grouping strategies from
the original. Th´orisson (1994) lists the following
three combinations of features as possible strate-
gies of perceptual grouping.
• shape and proximity
• color and proximity
• size and proximity
However, these strategies are inappropriate to gen-
erate referring expressions. For example, because
two blue balls b1 and b2 in Figure 3 are too
much distant from each other, Th´orisson’s algo-
rithm cannot recognize the group consisting of b1
and b2 with the original strategies. However, the
expression like “the left blue ball” can naturally
refer to b1. When using such an expression, we
assume an implicit group consisting of b1 and b2.
Hence, we do not combine features but use them
separately.
The results of perceptual grouping of the situa-
tion in Figure 3 are shown below. Relation labels
are assigned to recognized groups with respect to
features used in perceptual grouping. We define
six labels: all, type, shape, color, size,and
space. Features singleton, proximity and closure
share the same label space. A group may have
several labels.
feature label recognized groups
total all {t1,t2,t3,p1,b1,b2,b3,b4,b5}
singleton space {t1}, {t2}, {t3}, {p1}, {b1}, {b2},
{b3}, {b4}, {b5}
type type {t1,t2,t3},{p1},{b1,b2,b3,b4,b5}
shape shape {t1,t2},{t3}
color color {b1,b2},{b3},{b4,b5}
size size {b1,b3,b4},{b2,b5}
proximity space {t2,t3},{b1,b3,b4,b5},{b3,b4,b5}
closure space {b1},{b3,b4}
75
Target # target object
AllGroups # all generated groups
SOGList # list of generated SOGs
01:makeSOG()
02: SOG = []; # list of groups and symbols
03: All = getAll(); # total set
04: add(All, SOG); # add All to SOG
05: TypeList = getAllTypes(All);
# list of all object types
06: TargetType = getType(Target);
# type of the target
07: TargetSailency = saliency(TargetType);
# saliency of the target type
08: for each Type in TypeList do
# {Table, Plant, Ball}
09: if saliency(Type) ≥
TargetSaliency then
# saliency: Table > Plant > Ball
10: Group = getTypeGroup(Type);
# get the type group of Type
11: extend(SOG, Group);
12: end if
13: end for
14:return
Figure 4: Function makeSOG
3.2 Step 2: SOG generation
The next step is generating SOGs. This is so-
called content planning in natural language gen-
eration. Figure 4, Figure 5 and Figure 6 show the
algorithm of making SOGs.
Three variables Target, AllGroups,and
SOGListdefined in Figure 4 are global variables.
Target holds the target object which the refer-
ring expression refers to. AllGroups holds the
set of all groups recognized in Step 1. Given
Target and AllGroups, function makeSOG
enumerates possible SOGs in the depth-first man-
ner, and stores them in SOGList.
makeSOG(Figure 4) makeSOGstarts with a list
(SOG) that contains the total set of objects in the
domain. It chooses groups of objects that are more
salient than or equal to the target object and calls
function extendfor each of the groups.
extend (Figure 5) Given an SOG and a group
to be added to the SOG,function extendextends
the SOG with the group for each label attached to
the group. This extension is done by creating a
copy of the given SOG and adding to its end an
intra-group relation symbol definedinSection2.2
corresponding to the given label and group. Fi-
nally it calls searchwith the copy.
search(Figure 6) This function takes an SOG
as its argument. According to the last group in
01:extend(SOG, Group)
02: Labels = getLabels(Group);
03: for each Label in Labels do
04: SOGcopy = copy(SOG);
05: add(
Label
−→, SOGcopy);
06: add(Group, SOGcopy);
07: search(SOGcopy);
08: end for
09:return
Figure 5: Function extend
the SOG (LastGroup), it extends the SOG as
described below.
1. If LastGroup is a singleton of the target
object, append SOG to SOGListand return.
2. If LastGroup is a singleton of a non-target
object, find the groups that contain the target
object and satisfy the following three condi-
tions: (a), (b) and (c).
(a) All objects in the group locate in
the same direction from the object of
LastGroup (the reference). Possi-
ble directions are one of “back”, “back
right”, “right”, “front right”, “front”,
“front left”, “left”, “left back” and “on”.
The direction is determined on the basis
of coordinate values of the objects, and
is assigned to the group for the use of
surface realization.
(b) There is no same type object located be-
tween the group and the reference.
(c) The group is not a total set of a certain
type of object.
Then, for each of the groups, make a copy
of the SOG, and concatenate “⇒”andthe
group to the copy, and call search recur-
sively with the new SOG.
3. If LastGroup contains the target object
together with other objects, let the inter-
section of LastGroup and each group in
AllGroups be NewG, and copy the label
from each group to NewG.IfNewG contains
the target object, call function extend un-
less Checkedcontains NewG.
4. If LastGroup contains only non-target ob-
jects, call function extend for each group
(Group)inAllGroupswhich is subsumed
by LastGroup.
Figure 7 shows the SOGs generated to refer to
object b1 in Figure 3.
76
1. [{all}
type
−→ {t1,t2,t3}
space
−→ {t1}⇒{b1}]
2. [{all}
type
−→ {t1,t2,t3}
shape
−→ {t1,t2}
space
−→ {t1}⇒{b1}]
3. [{all}
type
−→ {b1,b2,b3,b4,b5}
space
−→ {b1}]
4. [{all}
type
−→ {b1,b2,b3,b4,b5}
color
−→ {b1,b2}
space
−→ {b1}]
5. [{all}
type
−→ {b1,b2,b3,b4,b5}
color
−→ {b1,b2}
size
−→ {b1}]
6. [{all}
type
−→ {b1,b2,b3,b4,b5}
size
−→ {b1,b4,b3}
space
−→ {b1}]
7. [{all}
type
−→ {b1,b2,b3,b4,b5}
size
−→ {b1,b4,b3}
color
−→ {b1}]
8. [{all}
type
−→ {b1,b2,b3,b4,b5}
space
−→ {b1,b3,b4,b5}
space
−→ {b1}]
9. [{all}
type
−→ {b1,b2,b3,b4,b5}
space
−→ {b1,b3,b4,b5}
color
−→ {b1}]
10. [{all}
type
−→ {b1,b2,b3,b4,b5}
space
−→ {b1,b3,b4,b5}
size
−→ {b1,b3,b4}
space
−→ {b1}]
11. [{all}
type
−→ {b1,b2,b3,b4,b5}
space
−→ {b1,b3,b4,b5}
size
−→ {b1,b3,b4}
color
−→ {b1}]
Figure 7: Generated SOGs from the situation in Figure 3
01:search(SOG)
02: LastGroup = getLastElement(SOG);
# get the rightmost group in SOG
03: Card = getCardinality(LastGroup);
04: if Card == 1 then
05: if containsTarget(LastGroup) then
# check if LastGroup contains
# the target
06: add(SOG, SOGList);
07: else
08: GroupList =
searchTargetGroups(LastGroup);
# find groups containing the target
09: for each Group in GroupList do
10: SOGcopy = copy(SOG);
11: add(⇒, SOGcopy);
12: add(Group, SOGcopy);
13: search(SOGcopy);
14: end for
15: end if
16: elsif containsTarget(LastGroup) then
17: Checked = [ ];
18: for each Group in AllGroups do
19: NewG = Intersect(Group, LastGroup);
# make intersection
20: Labels = getLabels(Group);
21: setLabels(Labels, NewG);
# copy labels from Group to NewG
22: if containsTarget(NewG) &
!contains(Checked, NewG) then
23: add(NewG, Checked);
24: extend(SOG, Group);
25: end if
26: end for
27: else
28: for each Group of AllGroups do
29: if contains(LastGroup, Group) then
30: extend(SOG, Group);
31: end if
32: end for
33: end if
34:return
Figure 6: Function search
3.3 Step 3: Surface realization
A referring expression is generated by determin-
istically assigning a linguistic expression to each
element in an SOG according to Rule 1 and 2.
As Japanese is a head-final language, simple con-
catenation of element expressions makes a well-
formed noun phrase
1
. Rule 1 generates expres-
sions for groups and Rule 2 does for relations.
Each rule consists of several subrules which are
applied in this order.
[Rule 1]: Realization of groups
Rule 1.1 The total set ({all}) is not realized.
(Funakoshi et al., 2004) collected referring
expressions from human subjects through ex-
periments and found that humans rarely men-
tioned the total set. According to their obser-
vation, we do not realize the total set.
Rule 1.2 Realize the type name for a singleton.
Type is realized as a noun and only for a sin-
gleton because the type feature is used first to
narrow down the group, and the succeeding
groups consist of the same type objects until
reaching the singleton. When the singleton is
not the last element of SOG, particle “no”is
added.
Rule 1.3 The total set of the same type objects is
not realized.
This is because the same reason as Rule 1.1.
Rule 1.4 The group followed by the relation
space
−→
is realized as “[cardinality] [type] no-uti
(among)”, e.g., “futatu-no (two) tukue (desk)
no-uti (among)”. The group followed by
1
Although different languages require different surface
realization rules, we presume perceptual grouping and SOG
generation (Step 1 and 2) are applicable to other languages as
well.
77
the relation ⇒ is realized as “[cardinality]
[type] no”.
When consecutive groups are connected by
other than spatial relations (
space
−→ and ⇒),
they can be realized as asequence of relations
ahead of the noun (type name). For example,
expression “the red ball among big balls” can
be simplified to “the big red ball”.
Rule 1.5 Other groups are not realized.
[Rule 2]: Realization of relations
Rule 2.1 Relation
type
−→ is not realized.
SeeRule1.2.
Rule 2.2 Relations
shape
−→ ,
color
−→ and
size
−→ are real-
ized as the expressions corresponding to their
attribute values. Spatial relations (
space
−→ and
⇒) are realized as follows, where |G
i
| de-
notes the cardinality of G
i
.
Intra-group relation (G
i
space
−→ G
i+1
)
If |G
i
| =2(i.e., |G
i+1
| =1), based on the
geometric relations among objects, generate
one of four directional expressions “{migi,
hidari, temae, oku} no ({right, left, front,
back})”.
If |G
i
|≥3 and |G
i+1
| =1, based on the
geometric relations among objects, generate
one of eight directional expressions “itiban
{migi, hidari, temae, oku, migi temae, hi-
dari temae, migi oku, hidari oku} no ({right,
left, front, back, front right, front left, back
right, back left}-most)” if applicable. If none
of these expressions is applicable, generate
expression “mannaka no (middle)” if appli-
cable. Otherwise, generate one of four ex-
pressions “{hidari, migi, temae, oku} kara
j-banme no (j-th from {left, right, front,
back})”.
If |G
i+1
|≥2, based on the geometric rela-
tions among objects, generate one of eight di-
rectional expressions “{migi, hidari, temae,
oku, migi temae, hidari temae, migi oku, hi-
dari oku} no ({right, left, front, back, front
right, front left, back right, back left})”.
Inter-group relation (G
i
⇒G
i+1
)
|G
i
| =1should hold because of search
in Step 2. According to the direction as-
signed by search, generate one of nine ex-
pressions : “{migi, hidari, temae, oku, migi
temae, hidari temaen, migi oku, hidari oku,
ue} no ({right, left, front, back, front right,
front left, back right, back left, on})”.
Figure 8 shows the expressions generated from
the first three SOGs shown in Figure 7. The num-
bers in the parentheses denote coindexes of frag-
ments between the SOGs and the realized expres-
sions.
3.4 Step 4: Scoring
Weassign ascore toeach expression bytaking into
account the relations used in the expression, and
the length of the expression.
First we assign a cost ranging over [0,1] to each
relation in the given SOG. Costs of relations are
decided as below. These costs conform to the pri-
orities of features described in (Dale and Reiter,
1995).
type
−→ : No cost (to be neglected)
shape
−→ :0.2
color
−→ :0.4
size
−→ : big(est): 0.6, small(est): 0.8, middle: 1.0
space
−→,⇒: Cost functions are defined according to the
potential functions proposed in (Tokunaga
et al., 2005). The cost for relation “on” is
fixed to 0.
Then, the average cost of the relations is calcu-
lated to obtain the relation cost, C
rel
. The cost of
surface length (C
len
) is calculated by
C
len
=
length(expression)
max
i
length(expression
i
)
,
where the length of an expression is measured by
the number of characters.
Using these costs, the score of an expression is
calculated by
score =
1
α×C
rel
+(1− α)×C
len
.
α was set to 0.5 in the following experiments.
4 Evaluation
4.1 Experiments
Weconducted two experiments to evaluate expres-
sions generated by the proposed method.
Both experiments used the same 18 subjects and
the same 20 object arrangements which were gen-
erated automatically. For each arrangement, all
factors (number of objects, positions ofobjects, at-
tributes of objects, and the target object) were ran-
domly decided in advance to conform to the fol-
lowing conditions: (1) the proposed method can
generate more than five expressions for the given
target and (2) more than two other objects exist
which are the same type as the target.
78
1. SOG: [{all}
type
−→ {t1,t2,t3}
space
−→
(1)
{t1}
(2)
⇒
(3)
{b1}
(4)
]
itiban hidari no
(1)
tukue no
(2)
ue no
(3)
tama
(4)
(the ball
(4)
on
(3)
the leftmost
(1)
table
(2)
)
2. SOG: [{all}
type
−→ {t1,t2,t3}
shape
−→
(1)
{t1,t2}
(2)
space
−→
(3)
{t1}
(4)
⇒
(5)
{b1}
(6)
]
marui
(1)
futatu no tukue no uti
(2)
hidari no
(3)
tukue no
(4)
ue no
(5)
tama
(6)
(the ball
(6)
on
(5)
the left
(3)
table
(4)
among
(2)
the round
(1)
two tables
(2)
)
3. SOG: [{all}
type
−→ {b1,b2,b3,b4,b5}
space
−→
(1)
{b1}
(2)
]
itiban hidari no
(1)
tama
(2)
(the leftmost
(1)
ball
(2)
)
Figure 8: Realized expressions
H20�
20/20
��j	
�w��
t1
b3
b1
b4
t2
p1b2
Figure 9: An example stimulus of Experiment 1
Experiment 1 Experiment 1 was designed to
evaluate the ability of expressions to identify the
targets. The subjects were presented an arrange-
ment with a generated referring expression which
gained the highest score at a time, and were in-
structed to choose the object referred to by the ex-
pression. Figure 9 is an example of visual stimuli
used in Experiment 1. Each subject responded to
all 20 arrangements.
Experiment 2 Experiment 2 was designed to
evaluate validity of the scoring function described
in Section 3.4. The subjects were presented an
arrangement with a marked target together with
the best five generated expressions referring to the
target at a time. Then the subjects were asked
to choose the best one from the five expressions.
Figure 10 is an example of visual stimuli used in
Experiment 2. Each subject responded to the all
20 arrangements. The expressions used in Experi-
ment 2 include those used in Experiment 1.
4.2 Results
Table 1 shows the results of Experiment 1. The
average accuracy of target identification is 95%.
Figure 10: An example stimulus of Experiment 2
This shows a good performance of the generation
algorithm proposed in this paper.
The expression generated for arrangement
No. 20 (shown in Figure 9) resulted in the excep-
tionally poor accuracy. To refer to object b1, our
algorithm generated expression “itiban temae no
tama (the most front ball)” because b1 is the most
close object to person P in terms of the vertical
axis. Humans, however, chose theobject that isthe
closest to P in terms of Euclidean distance. Some
psychological investigation is necessary to build
a more precise geometrical calculation model to
solve this problem (Landragin et al., 2001).
Table 2 shows the results of Experiment 2. The
first row shows the rank of expressions based on
their score. The second row shows the count ofhu-
man votes for the expression. The third row shows
the ratio of the votes. The top two expressions oc-
cupy 72% of the total. This concludes that our
scoring function works well.
5Conclusion
This paper extended the SOG representation pro-
posed in (Funakoshi et al., 2004) to generate refer-
79
Table 1: Result of Experiment 1
ArangementNo.1 2 34567 8 910
Accuracy 0.89 1.0 1.0 1.0 1.0 1.0 1.0 0.94 1.0 1.0
11 12 13 14 15 16 17 18 19 20 Ave.
1.0 0.94 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.17 0.95
Table 2: Result of Experiment 2
Rank 12345Total
Vote 134 125 59 22 20 360
Share 0.37 0.35 0.16 0.06 0.06 1
ring expressions in more general situations.
The proposed method was implemented and
evaluated through two psychological experiments
using 18 subjects. The experiments showed that
generated expressions had enough discrimination
ability and that the scoring function conforms to
human preference well.
The proposed method would be able to handle
other attributes and relations as far as they can be
represented in terms of features as described in
section 3. Corresponding surface realization rules
might be added in that case.
In the implementation, we introduced rather ad
hoc parameters, particularly in the scoring func-
tion. Although this worked well in our experi-
ments, further psychological validation is indis-
pensable.
This paper assumed a fixed reference frame is
shared by all participants in a situation. How-
ever, when we apply our method to conversational
agent systems, e.g., (Tanaka et al., 2004), refer-
ence frames change dynamically and they must
be properly determined each time when generat-
ing referring expressions.
In this paper, we focused on two dimensional
situations. To apply our method to three dimen-
sional worlds, more investigation on human per-
ception of spatial relations are required. We ac-
knowledge that a simple application of the current
method does not work well enough in three dimen-
sional worlds.

References
Douglas E. Appelt. 1985. Planning English referring expres-
sions. Artificial Intelligence, 26:1–33.
Robert Dale and Nicholas Haddock. 1991. Generating re-
ferring expressions involving relations. In Proceedings of
the Fifth Conference of the European Chapter of the As-
sociation for Computational Linguistics(EACL’91), pages
161–166.
Robert Dale and Ehud Reiter. 1995. Computational interpre-
tations of the Gricean maxims in the generation of refer-
ring expressions. Cognitive Science, 19(2):233–263.
Robert Dale. 1992. Generating referring expressions: Con-
structing descriptions in a domain of objects and pro-
cesses. MIT Press, Cambridge.
Kotaro Funakoshi, Satoru Watanabe, Naoko Kuriyama, and
Takenobu Tokunaga. 2004. Generating referring expres-
sions using perceptual groups. In Proceedings of the 3rd
International Conference on Natural Language Genera-
tion: INLG04, pages 51–60.
Peter Heeman and Graeme Hirst. 1995. Collaborating refer-
ring expressions. Computational Linguistics, 21(3):351–
382.
Helmut Horacek. 1997. An algorithm for generating refer-
entialdescriptions withflexibleinterfaces. InProceedings
of the 35th Annual Meeting of the Association for Compu-
tational Linguistics, pages 206–213.
Emiel Krahmer and Mari¨et Theune. 2002. Efficient context-
sensitive generation of descriptions. In Kees van Deemter
and Rodger Kibble, editors, Information Sharing: Given-
ness and Newness in Language Processing. CSLI Publica-
tions, Stanford, California.
EmielKrahmer, Sebastiaan vanErk, and Andr´e Verleg. 2003.
Graph-based generation of referring expressions. Compu-
tational Linguistics, 29(1):53–72.
Fr´ed´eric Landragin, Nadia Bellalem, and Laurent Romary.
2001. Visual salience and perceptual grouping in mul-
timodal interactivity. In Proceedings of International
Workshop on Information Presentation and Natural Mul-
timodal Dialogue (IPNMD), pages 151–155.
Matthew Stone. 2000. On identifying sets. In Proceedings
of the 1st International Conference on Natural Language
Generation: INLG00, pages 116–123.
Hozumi Tanaka, Takenobu Tokunaga, and Yusuke Shinyama.
2004. Animated agents capable of understanding natural
language and performing actions. In Helmut Prendinger
and Mituru Ishizuka, editors, Life-Like Characters, pages
429–444. Springer.
KristinnR.Th´orisson. 1994. Simulatedperceptual grouping:
An application to human-computer interaction. In Pro-
ceedings of the Sixteenth Annual Conference of the Cog-
nitive Science Society, pages 876–881.
Takenobu Tokunaga, Tomofumi Koyama, and Suguru Saito.
2005. Meaning of Japanese spatial nouns. In Proceedings
of the Second ACL-SIGSEM Workshop on the Linguistic
Dimentions of Prepositions and their Use in Computa-
tional Linguistics: Formalisms and Applications, pages
93–100.
Kees van Deemter. 2002. Generating referring expressions:
Boolean extensions of the incremental algorithm. Compu-
tational Linguistics, 28(1):37–52.
