Generating Minimal Definite Descriptions
Claire Gardent
CNRS, LORIA, Nancy
gardent@loria.fr
Abstract
The incremental algorithm introduced in
(Dale and Reiter, 1995) for producing dis-
tinguishing descriptions does not always
generate a minimal description. In this
paper, I show that when generalised to
sets of individuals and disjunctive proper-
ties, this approach might generate unnec-
essarily long and ambiguous and/or epis-
temically redundant descriptions. I then
present an alternative, constraint-based al-
gorithm and show that it builds on existing
related algorithms in that (i) it produces
minimal descriptions for sets of individu-
als using positive, negative and disjunctive
properties, (ii) it straightforwardly gener-
alises to n-ary relations and (iii) it is inte-
grated with surface realisation.
1 Introduction
In English and in many other languages, a possible
function of definite descriptions is to identify a set
of referents1: by uttering an expression of the form
The N, the speaker gives sufficient information to the
hearer so that s/he can identify the set of the objects
the speaker is referring to.
From the generation perspective, this means that,
starting from the set of objects to be described and
from the properties known to hold of these objects
by both the speaker and the hearer, a definite de-
scription must be constructed which allows the user
1The other well-known function of a definite is to inform the
hearer of some specific attributes the referent of the NP has.
to unambiguously identify the objects being talked
about.
While the task of constructing singular definite
descriptions on the basis of positive properties has
received much attention in the generation literature
(Dale and Haddock, 1991; Dale and Reiter, 1995;
Horacek, 1997; Krahmer et al., 2001), for a long
time, a more general statement of the task at hand re-
mained outstanding. Recently however, several pa-
pers made a step in that direction. (van Deemter,
2001) showed how to extend the basic Dale and Re-
iter Algorithm (Dale and Reiter, 1995) to generate
plural definite descriptions using not just conjunc-
tions of positive properties but also negative and
disjunctive properties; (Stone, 1998) integrates the
D&R algorithm into the surface realisation process
and (Stone, 2000) extends it to deal with collective
and distributive plural NPs.
Notably, in all three cases, the incremental struc-
ture of the D&R’s algorithm is preserved: the al-
gorithm increments a set of properties till this set
uniquely identifies the target set i.e., the set of ob-
jects to be described. As (Garey and Johnson, 1979)
shows, such an incremental algorithm while be-
ing polynomial (and this, together with certain psy-
cholinguistic observations, was one of the primary
motivation for privileging this incremental strategy)
is not guaranteed to find the minimal solution i.e.,
the description which uniquely identifies the target
set using the smallest number of atomic properties.
In this paper, I argue that this characteristic of the
incremental algorithm while reasonably innocuous
when generating singular definite descriptions using
only conjunctions of positive properties, renders it
                 Computational Linguistics (ACL), Philadelphia, July 2002, pp. 96-103.
                         Proceedings of the 40th Annual Meeting of the Association for
cognitively inappropriate when generalised to sets of
individuals and disjunctive properties. I present an
alternative approach which always produce the min-
imal description thereby avoiding the shortcomings
of the incremental algorithm. I conclude by com-
paring the proposed approach with related proposals
and giving pointers for further research.
2 The incremental approach
Dale and Reiter’s incremental algorithm (cf. Fig-
ure 1) iterates through the properties of the target
entity (the entity to be described) selecting a prop-
erty, adding it to the description being built and com-
puting the distractor set i.e., the set of elements for
which the conjunction of properties selected so far
holds. The algorithm succeeds (and returns the se-
lected properties) when the distractor set is the sin-
gleton set containing the target entity. It fails if all
properties of the target entity have been selected and
the distractor set contains more than the target entity
(i.e. there is no distinguishing description for the
target).
This basic algorithm can be refined by ordering
properties according to some fixed preferences and
thereby selecting first e.g., some base level category
in a taxonomy, second a size attribute third, a colour
attribute etc.
a0 : the domain;
a1a3a2 , the set of properties of
a4 ;
To generate the UID a5 a2 , do:
1. Initialise: a6 := a0 , a5 a2 := a7 .
2. Check success:
If a6a9a8a9a10a11a4a13a12 return a5
a2
elseif a1a3a2 a8a14a7 then fail
else goto step 3.
3. Choose propertya15a17a16a19a18 a1a3a2 which picks out the smallest set
a6 a16 a8a20a6a22a21a23a10a25a24a27a26a28a15 a16a30a29 a24a32a31a33a12 .
4. Update: a5 a2 := a5 a2a17a34 a10a30a15a35a16a36a12a38a37a28a6 := a6a39a16 , a1a40a2 := a1a3a2a32a41 a10a33a15a35a16a42a12 . goto
step 2.
Figure 1: The D&R incremental Algorithm.
(van Deemter, 2001) generalises the D&R algo-
rithm first, to plural definite descriptions and second,
to disjunctive and negative properties as indicated in
Figure 2. That is, the algorithm starts with a dis-
tractor set a43 which initially is equal to the set of
individuals present in the context. It then incremen-
tally selects a property a44 that is true of the target set
(a45a47a46a49a48a50a48a44a52a51a50a51 ) but not of all elements in the distrac-
tor set (a43a49a53a46a54a48a50a48a44a55a51a50a51 ). Each selected property is thus
used to simultaneously increment the description be-
ing built and to eliminate some distractors. Success
occurs when the distractor set equals the target set.
The result is a distinguishing description (DD, a de-
scription that is true only of the target set) which is
the conjunction of properties selected to reach that
state.
a0 : the domain;
a56a58a57 a0 , the set to be described;
a1a60a59 , the properties true of the set a56 (
a61a63a62
a59
a8a64a21
a65a67a66
a59
a61 a62
a65 with
a61 a62
a65
the set of properties that are true of a24 );
To generate the distinguishing description a5 a59 , do:
1. Initialise: a6 := a0 , a5 a59 := a7 .
2. Check success:
If a6a9a8 a56 return a5 a59
elseif a1a60a59 a8a20a7 then fail
else goto step 3.
3. Choose propertya15 a16 a18 a1a60a59 s.t. a56a58a57a9a68a69a68a15 a16a71a70a69a70 and a6a73a72a57a9a68a69a68a15 a16a50a70a74a70
4. Update: a5 a59 := a5 a59 a34 a10a30a15a35a16a42a12a75a37a28a6 := a6a14a21 a68a69a68a15a35a16 a70a69a70 , a1 a59 := a1 a59 a41
a10a33a15 a16 a12 . goto step 2.
Figure 2: Extending D&R Algorithm to sets of indi-
viduals.
Phase 1: Perform the extended D&R algorithm using all liter-
als i.e., properties in a1
a62a77a76a79a78
; if this is successful then stop,
otherwise go to phase 2.
Phase 2: Perform the extended D&R algorithm using all prop-
erties of the form a1a14a80a55a1a82a81 with a1 a37 a1a82a81 a18 a1
a62a77a76a79a78
; if this is
successful then stop, otherwise go to phase 3.
Figure 3: Extending D&R Algorithm to disjunctive
properties
To generalise this algorithm to disjunctive and
negative properties, van Deemter adds one more
level of incrementality, an incrementality over the
length of the properties being used (cf. Figure 3).
First, literals are used i.e., atomic properties and
their negation. If this fails, disjunctive properties of
length two (i.e. with two literals) are used; then of
length three etc.
3 Problems
We now show that this generalised algorithm might
generate (i) epistemically redundant descriptions
and (ii) unnecessarily long and ambiguous descrip-
tions.
Epistemically redundant descriptions. Suppose
the context is as illustrated in Figure 4 and the target
set is a83a85a84a87a86a85a88a89a84a91a90a93a92 .
pdt secr treasurer board-member member
a24a40a94 a95 a95 a95
a24a32a96 a95 a95 a95
a24a35a97 a95 a95 a95
a24a35a98 a95 a95
a24a32a99 a95 a95
a24a35a100 a95
Figure 4: Epistemically redundant descriptions
“The president and the secretary who are board
members and not treasurers”
To build a distinguishing description for the tar-
get set a83a85a84a87a86a85a88a89a84a91a90a101a92 , the incremental algorithm will
first look for a property a44 in the set of literals
such that (i) a83a85a84a87a86a85a88a89a84a91a90a101a92 is in the extension of P and
(ii) a44 is not true of all elements in the distractor
set a43 (which at this stage is the whole universe
i.e., a83a85a84 a86 a88a89a84 a90 a88a89a84a91a102a93a88a89a84a91a103a35a88a89a84a91a104a17a88a89a84a91a105a93a92 ). Two literals satisfy
these criteria: the property of being a board mem-
ber and that of not being the treasurer2 Suppose
the incremental algorithm first selects the board-
member property thereby reducing the distractor set
to a83a85a84 a86 a88a89a84 a90 a88a89a84a106a102a17a88a89a84a107a103a35a88a89a84a106a104a93a92 . Then a108 treasurer is selected
which restricts the distractor set to a83a85a84a109a86a75a88a89a84a106a90a17a88a89a84 a103 a88a89a84 a104 a92 .
There is no other literal which could be used to fur-
ther reduce the distractor set hence properties of the
form a44a47a110a111a44a55a112 are used. At this stage, the algo-
rithm might select the property a113a91a114a116a115a117a110a73a118a93a119a85a120a67a121 whose
intersection with the distractor set yields the target
set a83a85a84 a86 a88a89a84 a90 a92 . Thus, the description produced is in
this case: board-member a122a123a108 treasurer a122a125a124a126a113a91a114a32a115a17a110a127a118a93a119a85a120a67a121a116a128
which can be phrased as the president and the sec-
retary who are board members and not treasurers –
whereas the minimal DD the president and the sec-
retary would be a much better output.
2Note that selecting properties in order of specificity will
not help in this case as neither president nor treasurer meet the
selection criterion (their extension does not include the target
set).
One problem thus is that, although perfectly well
formed minimal DDs might be available, the incre-
mental algorithm may produce “epistemically re-
dundant descriptions” i.e. descriptions which in-
clude information already entailed (through what we
know) by some information present elsewhere in the
description.
Unnecessarily long and ambiguous descriptions.
Another aspect of the same problem is that the al-
gorithm may yield unnecessarily long and ambigu-
ous descriptions. Here is an example. Suppose the
context is as given in Figure 5 and the target set is
a83a85a84 a104 a88a89a84 a105 a88a89a84a91a129a93a88a89a84a109a86a28a130a35a92 .
W D C B S M Pi Po H J
a24 a94 a95
a24a32a96 a95 a95
a24 a97 a95 a95
a24a35a98 a95 a95 a95
a24a32a99 a95 a95 a95 a95
a24a35a100 a95 a95 a95 a95
a24a32a131 a95 a95 a95
a24a35a132 a95 a95 a95
a24a35a133 a95 a95 a95 a95
a24a40a94a135a134 a95 a95 a95 a95
a24a40a94a30a94
W = white; D = dog; C = cow; B = big; S = small;
M = medium-sized; Pi = pitbul; Po = poodle; H = Holstein; J =
Jersey
Figure 5: Unnecessarily long descriptions.
The most natural and probably shortest descrip-
tion in this case is a description involving a disjunc-
tion with four disjuncts namely a44a55a136a39a110a20a44a27a137a138a110a20a110a82a139a140a110a142a141
which can be verbalised as the Pitbul, the Pooddle,
the Holstein and the Jersey.
This is not however, the description that will be
returned by the incremental algorithm. Recall that
at each step in the loop going over the proper-
ties of various (disjunctive) lengths, the incremen-
tal algorithm adds to the description being built any
property that is true of the target set and such that
the current distractor set is not included in the set
of objects having that property. Thus in the first
loop over properties of length one, the algorithm
will select the property a143 , add it to the descrip-
tion and update the distractor set to a43a145a144a146a48a50a48a69a143a147a51a50a51a149a148
a83a85a84a109a86a85a88a89a84a106a90a17a88a89a84 a102 a88a89a84 a103 a88a89a84 a104 a88a89a84 a105 a88a89a84a39a150a93a88a89a84a91a151a93a88a89a84a91a129a17a88a89a84a87a86a28a130a93a92 . Since the
new distractor set is not equal to the target set
and since no other property of length one satisfies
the selection criteria, the algorithm proceeds with
properties of length two. Figure 6 lists the prop-
erties a44 of length two meeting the selection cri-
teria at that stage (a83a85a84 a104 a88a89a84 a105 a88a89a84a91a129a93a88a89a84a109a86a28a130a93a92a153a152a154a48a50a48a44a52a51a50a51 and
a83a85a84 a86 a88a89a84 a90 a88a89a84a91a102a93a88a89a84a107a103a35a88a89a84a106a104a17a88a89a84a91a105a17a88a89a84 a150 a88a89a84 a151 a88a89a84 a129 a88a89a84 a86a28a130 a92a155a53a46a156a48a50a48a44a52a51a50a51 .
a139a140a110a20a108a123a45 a83a85a84a109a86a85a88a89a84a91a90a17a88a89a84 a102 a88a89a84 a103 a88a89a84 a104 a88a89a84 a105 a88a89a84a91a151a17a88a89a84a106a129a17a88a89a84a109a86a28a130a93a92
a141a157a110a20a108a82a158 a83a85a84 a86 a88a89a84 a90 a88a89a84a106a102a17a88a89a84a91a104a17a88a89a84a106a105a101a88a89a84 a150 a88a89a84 a151 a88a89a84 a129 a88a89a84 a86a28a130 a92
a159
a110a14a108a82a160 a83a85a84a109a86a85a88a89a84 a102 a88a89a84 a103 a88a89a84 a104 a88a89a84 a105 a88a89a84a39a150a93a88a89a84a91a151a17a88a89a84a106a129a17a88a89a84a109a86a28a130a93a92
a160a145a110a20a43 a83a85a84a91a90a17a88a89a84 a102 a88a89a84 a103 a88a89a84 a104 a88a89a84 a105 a88a89a84a39a150a93a88a89a84a91a151a17a88a89a84a106a129a17a88a89a84a109a86a28a130a93a92
a159
a110a14a43 a83a85a84 a102 a88a89a84 a103 a88a89a84 a104 a88a89a84 a105 a88a89a84a39a150a161a88a89a84a106a151a17a88a89a84a91a129a17a88a89a84a87a86a28a130a93a92
Figure 6: Properties of length 2 meeting the selec-
tion criterion
The incremental algorithm selects any of these
properties to increment the current DD. Sup-
pose it selects a159 a110a162a43 . The DD is then up-
dated to a143 a122a163a124 a159 a110a147a43a164a128 and the distractor set to
a83a85a84 a102 a88a89a84 a103 a88a89a84 a104 a88a89a84 a105 a88a89a84a39a150a93a88a89a84a91a151a17a88a89a84a106a129a17a88a89a84a109a86a28a130a93a92 . Except for a160a162a110a165a43
and a108a82a160a54a110 a159 which would not eliminate any dis-
tractor, each of the other property in the table can
be used to further reduce the distractor set. Thus
the algorithm will eventually build the description
a143a166a122a167a124
a159
a110a14a43a168a128a39a122a169a124a36a139a170a110a14a108a123a45a123a128a39a122a169a124a171a141a157a110a14a108a82a158a163a128 thereby re-
ducing the distractor set to a83a85a84a106a102a17a88a89a84a91a104a17a88a89a84a106a105a17a88a89a84 a151 a88a89a84 a129 a88a89a84 a86a28a130 a92 .
At this point success still has not been reached
(the distractor set is not equal to the target set).
It will eventually be reached (at the latest when
incrementing the description with the disjunction
a44a55a136a106a110a14a44a27a137a117a110a9a110a82a139a172a110a9a141 ). However, already at this stage
of processing, it is clear that the resulting descrip-
tion will be awkward to phrase. A direct translation
from the description built so far (a143 a122a173a124 a159 a110a174a43a164a128a123a122
a124a36a139a162a110a14a108a123a45a123a128a175a122a176a124a171a141a22a110a9a108a82a158a163a128 ) would yield e.g.,
(1) The white things that are big or a cow, a Hol-
stein or not small, and a Jersey or not medium
size
Another problem then, is that when generalised
to disjunctive and negative properties, the incremen-
tal strategy might yield descriptions that are unnec-
essarily ambiguous (because of the high number of
logical connectives they contain) and in the extreme
cases, incomprehensible.
4 An alternative based on set constraints
One possible solution to the problems raised by the
incremental algorithm is to generate only minimal
descriptions i.e. descriptions which use the smallest
number of literals to uniquely identify the target set.
By definition, these will never be redundant nor will
they be unnecessarily long and ambiguous.
As (Dale and Reiter, 1995) shows, the problem
of finding minimal distinguishing descriptions can
be formulated as a set cover problem and is there-
fore known to be NP hard. However, given an effi-
cient implementation this might not be a hindrance
in practice. The alternative algorithm I propose is
therefore based on the use of constraint program-
ming (CP), a paradigm aimed at efficiently solving
NP hard combinatoric problems such as scheduling
and optimization. Instead of following a generate-
and-test strategy which might result in an intractable
search space, CP minimises the search space by
following a propagate-and-distribute strategy where
propagation draws inferences on the basis of effi-
cient, deterministic inference rules and distribution
performs a case distinction for a variable value.
The basic version. Consider the definition of a
distinguishing description given in (Dale and Reiter,
1995).
Let a121 be the intended referent, and a43 be
the distractor set; then, a set a177 of attribute-
value pairs will represent a distinguishing
description if the following two conditions
hold:
C1: Every attribute-value pair in a177 ap-
plies to a121 : that is, every element of
a177 specifies an attribute value that a121
possesses.
C2: For every member a120 of a43 , there is at
least one element a178 of a177 that does not
apply to a120 : that is, there is an a177 in a177
that specifies an attribute-value that a120
does not possess. a178 is said to rule out
a120 .
The constraints (cf. Figure 7) used in the pro-
posed algorithm directly mirror this definition.
A description for the target set a45 is represented
by a pair of set variables constrained to be a subset
of the set of positive(i.e., properties that are true of
all elements in a45 ) and of negative (i.e., properties
that are true of none of the elements in a45 ) properties
a179 : the universe;
a180a168a181
a182 : the set of properties a84 has;
a180a58a183
a182
a148
a180a142a184a91a180a168a181
a182 : the set of properties
a84 does not have;
a180 a181
a185
a148 a144
a182a17a186
a185
a180 a181
a182 : the set of properties true of all ele-
ments of a45 ;
a180 a183
a185
a148
a180a147a184a172a187
a182a17a186
a185
a180a168a181
a182 : the set of properties false of all
elements of a45 ;
a160
a185
a148a189a188a36a44
a181
a185
a88a25a44
a183
a185a58a190 is a basic distinguishing descrip-
tion for S iff:
1. a44 a181a185 a46 a180 a181a185 ,
2. a44 a183a185 a46 a180 a183a185 and
3. a191a39a120a127a152a169a43 a185 a88a101a192a193a124a36a44 a181a185 a184a194a180a168a181a195 a128 a187 a124a36a44 a183a185 a144 a180a168a181a195 a128a75a192a40a196a173a197
Figure 7: A constraint-based approach
of a45 respectively. The third constraint ensures that
the conjunction of properties thus built eliminates all
distractors i.e. each element of the universe which is
not in a45 . More specifically, it states that for each
distractor a120 there is at least one property a44 such that
either a44 is true of (all elements in) a45 but not of a120 or
a44 is false of (all elements in) a45 and true of a120 .
The constraints thus specify what it is to be a DD
for a given target set. Additionally, a distribution
strategy needs to be made precise which specifies
how to search for solutions i.e., for assignments of
values to variables such that all constraints are si-
multaneously verified. To ensure that solutions are
searched for in increasing order of size, we distribute
(i.e. make case distinctions) over the cardinality of
the output description a192a44 a181a185 a187 a44 a183a185 a192 starting with the
lowest possible value. That is, first the algorithm
will try to find a description a188a36a44 a181a185 a88a25a44 a183a185 a190 with cardi-
nality one, then with cardinality two etc. The algo-
rithm stops as soon as it finds a solution. In this way,
the description output by the algorithm is guaranteed
to always be the shortest possible description.
Extending the algorithm with disjunctive prop-
erties. To take into account disjunctive properties,
the constraints used can be modified as indicated in
Figure 8.
That is, the algorithm looks for a tuple of sets such
that their union a45a198a86 a187a142a199a75a199a75a199a93a187 a45a106a200 is the target set a45 and
such that for each set a45a106a201 in that tuple there is a basic
a160
a185
a148a146a160
a185
a94
a110
a199a75a199a75a199
a110a157a160
a185a101a202 is a distinguishing descrip-
tion for a set of individuals a45 iff:
a203a205a204a27a206
a158
a206
a192a207a45a208a192
a203
a45a209a148a210a45a198a86
a187a142a199a75a199a75a199a93a187
a45a109a211
a203 for a204a142a206
a136
a206
a158a156a88a25a160
a185
a16
is a basic distinguishing
description for a45a39a201
Figure 8: With disjunctive properties
DD a160 a185
a16
. The resulting description is the disjunctive
description a160 a185
a94
a110
a199a75a199a75a199
a110a169a160
a185a93a202 where each
a160
a185
a16
is a
conjunctive description.
As before solutions are searched for in increasing
order of size (i.e., number of literals occurring in the
description) by distributing over the cardinality of
the resulting description.
5 Discussion and comparison with related
work
Integration with surface realisation As (Stone
and Webber, 1998) clearly shows, the two-step strat-
egy which consists in first computing a DD and sec-
ond, generating a definite NP realising that DD, does
not do language justice. This is because, as the fol-
lowing example from (Stone and Webber, 1998) il-
lustrates, the information used to uniquely identify
some object need not be localised to a definite de-
scription.
(2) Remove the rabbit from the hat.
In a context where there are several rabbits and
several hats but only one rabbit in a hat (and only
one hat containing a rabbit), the sentence in (2) is
sufficient to identify the rabbit that is in the hat. In
this case thus, it is the presupposition of the verb “re-
move” which ensures this: since x remove y from z
presupposes that a212 was in a213 before the action, we can
infer from (2) that the rabbit talked about is indeed
the rabbit that is in the hat.
The solution proposed in (Stone and Webber,
1998) and implemented in the SPUD (Sentence Plan-
ning Using Descriptions) generator is to integrate
surface realisation and DD computation. As a prop-
erty true of the target set is selected, the correspond-
ing lexical entry is integrated in the phrase structure
tree being built to satisfy the given communicative
goals. Generation ends when the resulting tree (i)
satisfies all communicative goals and (ii) is syntac-
tically complete. In particular, the goal of describ-
ing some discourse old entity using a definite de-
scription is satisfied as soon as the given informa-
tion (i.e. information shared by speaker and hearer)
associated by the grammar with the tree suffices to
uniquely identify this object.
Similarly, the constraint-based algorithm for
generating DD presented here has been inte-
grated with surface realisation within the generator
INDIGEN (http://www.coli.uni-sb.de/
cl/projects/indigen.html) as follows.
As in SPUD, the generation process is driven by
the communicative goals and in particular, by in-
forming and describing goals. In practice, these
goals contribute to updating a “goal semantics”
which the generator seeks to realise by building a
phrase structure tree that (i) realises that goal seman-
tics, (ii) is syntactically complete and (iii) is prag-
matically appropriate.
Specifically, if an entity must be described which
is discourse old, a DD will be computed for that en-
tity and added to the current goal semantics thereby
driving further generation.
Like SPUD, this modified version of the SPUD al-
gorithm can account for the fact that a DD need not
be wholy realised within the corresponding NP – as
a DD is added to the goal semantics, it guides the lex-
ical lookup process (only items in the lexicon whose
semantics subsumes part of the goal semantics are
selected) but there is no restriction on how the given
semantic information is realised.
Unlike SPUD however, the INDIGEN generator
does not follow an incremental greedy search strat-
egy mirroring the incremental D&R algorithm (at
each step in the generation process, SPUD compares
all possible continuations and only pursues the best
one; There is no backtracking). It follows a chart
based strategy instead (Striegnitz, 2001) producing
all possible paraphrases. The drawback is of course
a loss in efficiency. The advantages on the other
hand are twofold.
First, INDIGEN only generates definite descrip-
tions that realize minimal DD. Thus unlike SPUD, it
will not run into the problems mentioned in section
2 once generalised to negative and disjunctive prop-
erties.
Second, if there is no DD for a given entity, this
will be immediately noticed in the present approach
thus allowing for a non definite NP or a quantifier
to be constructed instead. In contrast, SPUD will, if
unconstrained, keep adding material to the tree until
all properties of the object to be described have been
realised. Once all properties have been realised and
since there is no backtracking, generation will fail.
N-ary relations. The set variables used in our con-
straints solver are variables ranging over sets of in-
tegers. This, in effect, means that prior to applying
constraints, the algorithm will perform an encoding
of the objects being constrained – individuals and
properties – into (pairwise distinct) integers. It fol-
lows that the algorithm easily generalises to n-ary
relations. Just like the proposition red(a119a17a86 ) using the
unary-relation “red” can be encoded by an integer,
so can the proposition on(a119 a86 a88a11a119 a90 ) using the binary-
relation “on” be encoded by two integers (one for
on( a88a11a119a85a90 ) and one for on(a119a35a86a161a88 ).
Thus the present algorithm improves on (van
Deemter, 2001) which is restricted to unary rela-
tions. It also differs from (Krahmer et al., 2001),
who use graphs and graph algorithms for computing
DDs – while graphs provides a transparent encoding
of unary and binary relations, they lose much of their
intuitive appeal when applied to relations of higher
arity.
It is also worth noting that the infinite regress
problem observed (Dale and Haddock, 1991) to hold
for the D&R algorithm (and similarly for its van
Deemter’s generalisation) when extended to deal
with binary relations, does not hold in the present
approach.
In the D&R algorithm, the problem stems from
the fact that DD are generated recursively: if when
generating a DD for some entity a119a17a86 , a relation a121 is
selected which relates a119a17a86 to e.g., a119a85a90 , the D&R al-
gorithm will recursively go on to produce a DD for
a119a85a90 . Without additional restriction, the algorithm can
thus loop forever, first describing a119a35a86 in terms of a119a161a90 ,
then a119a85a90 in terms of a119a17a86 , then a119a35a86 in terms of a119a85a90 etc.
The solution adopted by (Dale and Haddock,
1991) is to stipulate that facts from the knowledge
base can only be used once within a given call to the
algorithm.
In contrast, the solution follows, in the present al-
gorithm (as in SPUD), from its integration with sur-
face realisation. Suppose for instance, that the initial
goal is to describe the discourse old entity a119a17a86 . The
initially empty goal semantics will be updated with
its DD say, a83a101a214a67a137a93a215a127a178a89a124a30a214a75a128a13a88a25a137a93a216a63a124a30a214a161a88a89a115a25a128a11a92 .
NP
D
the
Na217a107a218
Goal Semantics = a10a89a219a221a220a11a222a224a223 a29 a219a79a31a171a37a30a220a11a225 a29 a219a11a37a30a226a33a31a33a12
This information is then used to select appropri-
ate lexical entries i.e., the noun entry for “bowl” and
the preposition entry for “on”. The resulting tree
(with leaves “the bowl on”) is syntactically incom-
plete hence generation continues attempting to pro-
vide a description for a115 . If a115 is discourse old, the
lexical entry for the will be selected and a DD com-
puted say, a83a85a115a221a227a3a214a67a178a36a119a77a124a228a115a89a128a13a88a25a137a93a216a63a124a30a214a101a88a89a115a25a128a11a92 . This then is added
to the current goal semantics yielding the goal se-
mantics a83a85a115a221a227a3a214a67a178a36a119a77a124a228a115a89a128a13a88a11a214a67a137a93a215a127a178a89a124a30a214a75a128a13a88a25a137a93a216a63a124a30a214a101a88a89a115a89a128a11a92 which is com-
pared with the semantics of the tree built so far i..e.,
a83a101a214a67a137a93a215a127a178a89a124a30a214a75a128a13a88a25a137a101a216a63a124a30a214a101a88a89a115a25a128a11a92 .
NP
D
the
Na217
N
bowl
PP
P
on
NP
D
the
Na229a3a218
Goal Semantics = a10a25a219a221a220a11a222a198a223 a29 a219a89a31a171a37a33a220a11a225 a29 a219a13a37a36a226a30a31a171a37a36a226a36a230a85a219a79a223a231a4 a29 a226a30a31a33a12
Tree Semantics = a10a89a219a221a220a11a222a224a223 a29 a219a79a31a171a37a33a220a13a225 a29 a219a11a37a36a226a33a31a33a12
Since goal and tree semantics are different, gener-
ation continue selecting the lexical entry for “table”
and integrating it in the tree being built.
NP
D
the
N
N
bowl
PP
P
on
NP
D
the
Na229
table
Goal Semantics = a10a25a219a221a220a11a222a198a223 a29 a219a89a31a171a37a33a220a11a225 a29 a219a13a37a36a226a30a31a171a37a36a226a36a230a85a219a79a223a231a4 a29 a226a30a31a33a12
Tree Semantics = a10a89a219a221a220a11a222a224a223 a29 a219a89a31a171a37a30a220a11a225 a29 a219a13a37a30a226a33a31a171a37a42a226a36a230a85a219a79a223a231a4 a29 a226a30a31a33a12
At this stage, the semantics of that tree is
a83a85a115a79a227a40a214a67a178a30a119a116a124a228a115a25a128a13a88a11a214a67a137a101a215a23a178a89a124a30a214a38a128a13a88a25a137a93a216a63a124a30a214a101a88a89a115a25a128a11a92 which is equivalent to
the goal semantics. Since furthermore the tree is
syntactically and pragmatically complete, genera-
tion stops yielding the NP the bowl on the table.
In sum, infinite regress is avoided by using the
computed DDs to control the addition of new mate-
rial to the tree being built.
Minimality and overspecified descriptions. It
has often been observed that human beings produce
overspecified i.e., non-minimal descriptions. One
might therefore wonder whether generating minimal
descriptions is in fact appropriate. Two points speak
for it.
First, it is unclear whether redundant information
is present because of a cognitive artifact (e.g., incre-
mental processing) or because it helps fulfill some
other communicative goal besides identification. So
for instance, (Jordan, 1999) shows that in a specific
task context, redundant attributes are used to indi-
cate the violation of a task constraint (for instance,
when violating a colour constraint, a task participant
will use the description “the red table” rather than
“the table” to indicate that s/he violates a constraint
to the effect that red object may not be used at that
stage of the task).
More generally, it seems unlikely that no rule at
all governs the presence of redundant information in
definite descriptions. If redundant descriptions are
to be produced, they should therefore be produced
in relation to some general principle (i.e., because
the algorithm goes through a fixed order of attribute
classes or because the redundant information fulfills
a particular communicative goal) not randomly, as is
done in the generalised incremental algorithm.
Second, the psycholinguistic literature bearing on
the presence of redundant information in definite
descriptions has mainly been concerned with unary
atomic relations. Again once binary, ternary and dis-
junctive relations are considered, it is unclear that
the phenomenon generalises. As (Krahmer et al.,
2001) observed, “it is unlikely that someone would
describe an object as “the dog next to the tree in front
of the garage” in a situation where “the dog next to
the tree” would suffice.
Implementation. The ideas presented in this pa-
per have been implemented within the genera-
tor INDIGEN using the concurrent constraint pro-
gramming language Oz (Programming Systems Lab
Saarbr¨ucken, 1998) which supports set variables
ranging over finite sets of integers and provides an
efficient implementation of the associated constraint
theory. The proof-of-concept implementation in-
cludes the constraint solver described in section 4
and its integration in a chart-based generator inte-
grating surface realisation and inference. For the ex-
amples discussed in this paper, the constraint solver
returns the minimal solution (i.e., The cat and the
dog and The poodle, the Jersey, the pitbul and the
Holstein) in 80 ms and 1.4 seconds respectively. The
integration of the constraint solver within the gener-
ator permits realising definite NPs including nega-
tive information (the cat that is not white) and sim-
ple conjunctions (The cat and the dog).
6 Conclusion
One area that deserves further investigation is the
relation to surface realisation. Once disjunctive
and negative relations are used, interesting questions
arise as to how these should be realised. How should
conjunctions, disjunctions and negations be realised
within the sentence? How are they realised in prac-
tice? and how can we impose the appropriate con-
straints so as to predict linguistically and cognitively
acceptable structures? More generally, there is the
question of which communicative goals refer to sets
rather than just individuals and of the relationship
to what in the generation literature has been bap-
tised “aggregation” roughly, the grouping together
of facts exhibiting various degrees and forms of sim-
ilarity.
Acknowledgments
I thank Denys Duchier for implementing the ba-
sic constraint solver on which this paper is based
and Marilisa Amoia for implementing the exten-
sion to disjunctive relations and integrating the con-
straint solver into the INDIGEN generator. I also
gratefully acknowledge the financial support of the
Conseil R´egional de Lorraine and of the Deutsche
Forschungsgemeinschaft.
References
R. Dale and N. Haddock. 1991. Content determination
in the generation of referring expressions. Computa-
tional Intelligence, 7(4):252–265.
R. Dale and E. Reiter. 1995. Computational interpreta-
tions of the gricean maxims in the generation of refer-
ring expressions. Cognitive Science, 18:233–263.
W. Garey and D. Johnson. 1979. Computers
and Intractability: a Guide to the Theory of NP-
Completeness. W.H.Freeman, San Francisco.
H. Horacek. 1997. An algorithm for generating referen-
tial descriptions with flexible interfaces. In Proceed-
ings of the 35a232a135a233 Annual Meeting of the Association for
Computational Linguistics), pages 206–213, Madrid.
P. W. Jordan. 1999. An empirical study of the commu-
nicative goals impacting nominal expressions. In the
Proceedings of the ESSLLI workshop on The Genera-
tion of Nominal Expression.
E. Krahmer, S. van Eerk, and Andr´e Verleg. 2001. A
meta-algorithm for the generation of referring expres-
sions. In Proceedings of the 8th European Workshop
on Natural Language Generation, Toulouse.
Programming Systems Lab Saarbr¨ucken. 1998. Oz Web-
page: http://www.ps.uni-sb.de/oz/.
M. Stone and Bonnie Webber. 1998. Textual economy
through closely coupled syntax and semantics. In Pro-
ceedings of the Ninth International Workshop on Nat-
ural Language Generation, pages 178–187, Niagara-
on-the-Lake, Canada.
M. Stone. 1998. Modality in Dialogue: Planning, Prag-
matics and Computation. Ph.D. thesis, Department of
Computer & Information Science, University of Penn-
sylvania.
M. Stone. 2000. On Identifying Sets. In Proceedings
of the First international conference on Natural Lan-
guage Generation, Mitzpe Ramon.
Kristina Striegnitz. 2001. A chart-based generation algo-
rithm for LTAG with pragmatic constraints. To appear.
K. van Deemter. 2001. Generating Referring Expres-
sions: Boolean Extensions of the Incremental Algo-
rithm. To appear in Computational Linguistics.
