Proceedings of the 3rd Workshop on Constraints and Language Processing (CSLP-06), pages 41–50,
Sydney, July 2006. c©2006 Association for Computational Linguistics
Capturing Disjunction in Lexicalization
with Extensible Dependency Grammar
Jorge Marques Pelizzoni
ICMC - Univ. de São Paulo - Brazil
Langue & Dialogue - LORIA - France
Jorge.Pelizzoni@loria.fr
Maria das Graças Volpe Nunes
ICMC - Univ. de São Paulo - Brazil
gracan@icmc.usp.br
Abstract
In spite of its potential for bidirectionality,
Extensible Dependency Grammar (XDG)
has so far been used almost exclusively
for parsing. This paper represents one of
the first steps towards an XDG-based inte-
grated generation architecture by tackling
what is arguably the most basic among
generation tasks: lexicalization. Herein
we present a constraint-based account of
disjunction in lexicalization, i.e. a way
to enable an XDG grammar to generate
all paraphrases — along the lexicalization
axis, of course — realizing a given in-
put semantics. Our model is (i) efficient,
yielding strong propagation, (ii) modu-
lar and (iii) favourable to synergy inas-
much as it allows collaboration between
modules, notably semantics and syntax.
We focus on constraints ensuring well-
formedness and completeness and avoid-
ing over-redundancy.
1 Introduction
In text generation the term lexicalization (Reiter
and Dale, 2000) refers to deciding which among a
choice of potentially applicable lexical items real-
izing a given intended meaning are actually going
to take part in a generated utterance. It can be re-
garded as a general, necessary generation task —
especially if one agrees that the term task does not
necessarily imply pipelining — and remarkably
pervasive at that. For instance, even though the
realization of such a phrase as “a ballerina” owes
much to referring expression generation, a com-
plementary task, it is still a matter of lexicaliza-
tion whether to prioritize that specific phrase over
all its possible legitimate alternates, e.g. “a female
dancer”, “a dancing woman” or “a dancing female
person”. However, prior to the statement of prior-
itizing criteria or selection preferences and rather
as the very substratum thereto, the ultimate matter
of lexicalization is exactly alternation, choice —
in one word, disjunction.
Given the combinatorial nature of language
and specifically the interchangeability of lexical
items yielding hosts of possible valid solutions to
one same instance lexicalization task, disjunction
may well become a major source of (combinato-
rial) complexity. Our subject matter in this pa-
per is solely disjunction in lexicalization as a ba-
sis for more advanced lexicalization models, and
our purpose is precisely to describe a constraint-
based model that (i) captures the disjunctive po-
tential of lexicalization, i.e. allows the generation
of all mutually paraphrasing solutions (according
to a given language model) to any given lexical-
ization task, (ii) ensures well-formedness, espe-
cially ruling out over-redundancy (such as found
in “∗a dancing female dancer/ballerina/woman”)
and syntactic anomalies (“∗a dancer woman”), and
does so (iii) modularly, in that not only are con-
cerns neatly separated (e.g. semantics vs. syntax),
but also solutions are reusable, and future exten-
sions, likely to be developed with no change to
current modules, (iv) efficiently, having an imple-
mentation yielding strong propagation and thus
prone to keep complexity at acceptable levels, and
(v) synergicly, inasmuch as it promotes the inter-
play between modules (namely syntax and seman-
tics) and seems compatible with the concept of in-
tegrated generation architectures (Reiter and Dale,
2000), i.e. those in which tasks are not executed in
pipeline, but are rather interleaved so as to avoid
failed or suboptimal choices during search.
We build upon the Extensible Dependency
Grammar (XDG) (Debusmann et al., 2004b; De-
busmann et al., 2004a; Debusmann et al., 2005)
model and its CP implementation in Oz (Van Roy
and Haridi, 2004), namely the XDG Develop-
ment Toolkit1 (XDK) (Debusmann et al., 2004c).
1http://www.ps.uni-sb.de/~rade/xdg.
41
In fact, all those appealing goals of modularity,
efficiency and synergy are innate to XDG and
the XDK, and our work can most correctly be
regarded as the very first attempts at equipping
XDG for generation and fulfilling its bidirectional
promise.
The paper proceeds as follows. Section 2 pro-
vides background information on XDG and the
XDK. Section 3 motivates our lexicalization dis-
junction model and describes it both intuitively
and formally, while Section 4 presents implemen-
tation highlights, assuming familiarity with the
XDK and focusing on the necessary additions and
modifications to it, as well as discussing perfor-
mance. Finally, in Section 5 we conclude and dis-
cuss future work.
2 Extensible Dependency Grammar
An informal overview of XDG’s core concepts is
in order; for a formal description of XDG, how-
ever, see (Debusmann and Smolka, 2006; Debus-
mann et al., 2005). Strictly speaking, XDG is
not a grammatical framework, but rather a descrip-
tion language over finite labelled multigraphs that
happens to show very convenient properties for
the modeling of natural language, among which a
remarkable reconciliation between monostratality,
on one side, and modularity and extensibility, on
the other.
Most of XDG’s strengths stem from its multi-
dimensional metaphor (see Fig. 1), whereby an
(holistic or multidimensional) XDG analysis con-
sists of a set of concurrent, synchronized, comple-
mentary, mutually constraining one-dimensional
analyses, each of which is itself a graph sharing
the same set of nodes as the other analyses, but
having its own type or dimension, i.e., its own
edge label and lexical feature types and its own
well-formedness constraints. In other words, each
1D analysis has a nature and interpretation of its
own, associates each node with one respective in-
stance of a data type of its own (lexical features)
and establishes its own relations/edges between
nodes using labels and principles of its own.
That might sound rather autistic at first, but the
1D components of an XDG analysis interact in
fact. It is exactly their sharing one same set of
nodes, whose sole intrinsic property is identity,
that provides the substratum for interdimensional
communication, or rather, mutual constraining.
html
Figure 1: Three concurrent one-dimensional
analyses. It is the sharing of one same set of
nodes that co-relates and synchronizes them into
one holistic XDG analysis.
ID
1
Mary
2
wants
3
to
4
laugh
5
.
root
part
vinfsubj
LP
1
Mary
2
wants
3
to
4
laugh
5
.
nounf
finf
partf infinf
root
root
partf
vinffsubjf
DS
1
Mary
2
wants
3
to
4
laugh
5
.
root del
subjdsu
bjd
subd
PA
1
Mary
2
wants
3
to
4
laugh
5
.
root
root del
arg1
arge
arg1
SC
1
Mary
2
wants
3
to
4
laugh
5
.
root del
a
s
Figure 2: A 5D XDG analysis for “Mary wants to
laugh.” according to grammar Chorus.ul deployed
with the XDK
42
That is chiefly achieved by means of two devices,
namely: multidimensional principles and lexical
synchronization.
Multidimensional principles. Principles are
reusable, usually parametric constraint predicates
used to define grammars and their dimensions.
Those posing constraints between two or more 1D
analyses are said multidimensional. For example,
the XDK library provides a host of linking princi-
ples, one of whose main applications is to regu-
late the relationship between semantic arguments
and syntactic roles according to lexical specifica-
tions. The framework allows lexical entries to con-
tain features of the type lab(D1) → {lab(D2)},
i.e. mappings from edge labels in dimension D1
to sets of edge labels in D2. Therefore, lexi-
cal entries specifying {pat →{subj}} might be
characteristic of unaccusative verbs, while those
with {agt →{subj}, pat →{obj}} would suit
a class of transitive ones. Linking principles pose
constraints taking this kind of features into ac-
count.
Lexical synchronization. The lexicon compo-
nent in XDG is specified in two steps: first, each
dimension declares its own lexicon entry type;
next, once all dimensions have been declared, lex-
icon entries are provided, each specifying the val-
ues for features on all dimensions. Finally, at run-
time it is required of well-formed analyses that
there should be at least one valid assignment of
lexicon entries to nodes such that all principles are
satisfied. In other words, every node must be as-
signed a lexicon entry that simultaneously satisfies
all principles on all dimensions, for which reason
the lexicon is said to synchronize all 1D compo-
nents of an XDG analysis. Lexical synchroniza-
tion is a major source of propagation.
Figure 2 presents a sample 5D XDG analysis
involving the most standard dimensions in XDG
practice and jargon, namely (i) PA, capturing
predicate argument structure; (ii) SC, captur-
ing the scopes of quantifiers; (iii) DS, for deep
syntax, i.e. syntactic structure modulo control and
raising phenomena; (iv) ID, for immediate domi-
nance in surface syntax (as opposed to DS); and
(v) LP, for linear precedence, i.e. a structure
tightly related to ID working as a substratum for
constraints on the order of utterance of words. In
fact, among these dimensions LP is the only one
actually to involve a concept of order. PA and DS,
in turn, are the only ones not constrained to be
trees, but directed acyclic graphs instead. Further
details on the meaning of all these dimensions, as
well as the interactions between them, would be
beyond the scope of this paper and have been dealt
with elsewhere. From Section 3 on we shall focus
on PA and, to a lesser extent, the dimension with
which it interfaces directly: DS.
Emulating deletion. Figure 2 also illustrates the
rather widespread technique of deletion, there ap-
plied to infinitival “to” on dimensions DS, PA,
and SC. As XDG is an eminently monostratal and
thus non-transformational framework, “deletion”
herein refers to an emulation thereof. According
to this technique, whenever a node has but one in-
coming edge with a reserved label, say del, on di-
mension D it is considered as virtually deleted on
D. In addition, one artificial root node is postu-
lated from which emerge as many del edges as re-
quired on all dimensions. The trick also comes in
handy when tackling, for instance, multiword ex-
pressions (Debusmann, 2004), which involve wor-
thy syntactic nodes that conceptually have no se-
mantic counterparts.
3 Modelling Lexicalization Disjunction
in XDG
Generation input. Having revised the basics of
XDG, it is worth mentioning that so far it has
been used mostly for parsing, in which case the in-
put type is usually rather straightforward, namely
typewritten sentences or possibly other text units.
Model creation is also very simple in parsing and
consists of (i) creating exactly one node for each
input token, all nodes being instances of one sin-
gle homogeneous feature structure type automat-
ically inferred from the grammar definition, (ii)
making each node select from all the lexical en-
tries indexed by its respective token, (iii) posing
constraints automatically generated from the prin-
ciples found in the grammar definition and (iv)
deterministically assigning values to the order-
related variables in nodes so as to reflect the actual
order of tokens in input.
As concerns generation, things are not so clear,
though. For a start, take input, which usually
varies across applications and systems, not to
mention the fact that representability and com-
putability of meaning in general are open issues.
Model creation should follow closely, as it is a di-
rect function of input. Notwithstanding, we can
43
to some extent and advantage tell what genera-
tion input is not. Under the hypothesis of an
XDG-based generation system tackling lexicaliza-
tion, input is not likely to contain some direct rep-
resentation of fully specified PA analyses, much
though this is usually regarded as a satisfactory
output for a parsing system (!). What happens
here is that generating an input PA analysis would
presuppose lexicalization having already been car-
ried out. In other words, PA analyses accounting
for e.g. “a ballerina” and “a dancing female hu-
man being” have absolutely nothing to do with
each other whereas what we wish is exactly to
feed input allowing both realizations. Therefore,
PA analyses are themselves part of generation out-
put and are acceptable as parsing output inasmuch
as “de-lexicalization” is considered a trivial task,
which is not necessarily true, however.
Although our system still lacks a comprehen-
sive specification of input format and semantics,
we have already established on the basis of the
above rationale that our original PA predicates
must be decomposed into simpler, primitive predi-
cates that expose their inter-relations. For the pur-
pose of the present discussion, we understand that
it suffices to specify that our input will contain flat
first-order logic-like conjunctions such as
∃x(dance(x)∧female(x)∧human(x)),
in order to characterize entities, even if the fi-
nal accepted language is sure to have a stricter
logic component than first-order logic and might
involve crossings with yet other formalisms. Pred-
icates, fortunately, are not necessarily unary; and,
for example, “A ballerina tapped a lovely she-dog”
might well be generated from the following input:
∃e,x,y




dance(x)∧female(x)∧
∧human(x)∧event(e)∧
∧past(e)∧tap(e,x,y)∧
∧female(y)∧dog(y)∧
lovely(y)




. (1)
Deletion as the substance of disjunction. Nat-
urally, simply creating one node for each input
semantic literal is not at all the idea behind our
model. For example, if “woman” is to be ac-
tually employed in a specific lexicalization task,
then it should continue figuring as one single node
in XDG analyses as usual in spite of potentially
covering a complex of literals. In fact, XDG and,
in specific, PA analyses should behave and resem-
ble much the same as they used to.
However, one remarkable difference of analyses
in our generation model as compared to parsing
lies in the role and scope of deletion, which in-
deed constitutes the very substance of disjunction
now. By assigning all nodes but the root one ex-
tra lexical entry synchronizing deletion on all di-
mensions2, we build an unrestrained form of dis-
junction whereby whole sets of nodes may as well
act as if not taking part in the solution. Now it
is possible to create nodes at will, even one for
each applicable lexical item, and rely on the fact
that, many ill-formed outputs as the set of all so-
lutions may contain, it still covers all correct para-
phrases, i.e. those in which all and only the right
nodes have been deleted. For example, should one
node be created for each of “ballerina”, “woman”,
“dancer”, “dancing”, “female” and “person”, all
possible combinations of these words, including
the correct ones, are sure to be generated.
Our design obviously needs further constrain-
ing, yet the general picture should be visible by
now that we really intend to finish model creation
— or rather, start search — with (i) a bunch of per-
fectly floating nodes in that not one edge is given
at this time, all of which are equally willing and
often going to be deleted, and (ii) a bunch of con-
straints to rule out ill-formed output and provide
for efficiency. There are two main gaps in this
summary, namely:
• what these constraints are and
• how exactly nodes are to be created.
This paper restricts itself to the first question. The
second one involves issues beyond lexicalization,
actually permeating all generation tasks, and is
currently our research priority. Consequently, in
all our experiments most of model creation was
handcrafted.
In the name of clarity, we shall hereafter ab-
stract over deletion, that is to say we shall in all re-
spects adhere to the illusion of deletion, that nodes
may cease to exist. In specific, whenever we refer
to the sisters, daughters and mothers of a node, we
mean those not due to deletion. In other words, all
happens as if deleted nodes had no relation what-
soever to any other node. This abstraction is ex-
tremely helpful and is actually employed in our
implementation, as shown in Section 4.
2That is, specifying valencies such that an incoming del
edge is required on all dimensions simultaneously.
44
3.1 How Nodes Relate
In the following description, we shall mostly re-
strict ourselves to what is novel in our model as
compared to current practice in XDG modelling.
Therefore, we shall emphasize dimension PA and
the new constraints we had to introduce in order
to have only the desired PA analyses emerge. Ex-
cept for sparse remarks on dimension DS and its
relationship with PA, which we shall also discuss
briefly, we assume without further mention the
concurrence of other XDG dimensions, principles
and concepts (e.g. lexical synchronization) in any
actual application of our model.
Referents, arguments and so nodes meet. For
the most part, ruling out ill-formed output con-
cerns posing constraints on acceptable edges, es-
pecially when one takes into account that all we
have is some floating nodes to start with. Let us
first recall that dimension PA is all about predicate
arguments, which are necessarily variables thanks
to the flat nature of our input semantics. Roughly
speaking, each PA edge relates a predicate with
one of its arguments and thus “is about” one sin-
gle variable. Therefore, our first concern must be
to ensure that every PA edge should land on a node
that “is (also) about” the same variable as the edge
itself.
In order to provide for such an “aboutness”
agreement, so to speak, one must first provide for
“aboutness” itself. Thus, we postulate that every
node should now have two new features, namely
(i) hook, identifying the referent of the node,
i.e. the variable it is about, and (ii) holes, map-
ping every PA edge label lscript into the argument (a
variable) every possible lscript-labelled outgoing edge
should be about. Normally these features should
be lexicalized. The coincidence with Copestake et
al.’s terminology (Copestake et al., 2001) is not
casual; in fact, our formulation can be regarded as
a decoupled fragment of theirs, since neither our
holes involves syntactic labels nor are scopal is-
sues ever touched. As usual in XDG, we leave it
for other modules such as mentioned in the previ-
ous section to take charge of scope and the rela-
tionship between semantic arguments and syntac-
tic roles. The role of these new features is depicted
in Figure 3, in which an arrow does not mean an
edge but the possibility of establishing edges.
Completeness and compositionality. Next we
proceed to ensure completeness, i.e. that every so-
Figure 3: For every node v and on top of e.g. va-
lency constraints, features hook and holes further
constrain the set of nodes able to receive edges
from v for each specific edge label.
lution should convey the whole intended seman-
tic content. To this end, nodes must have fea-
tures holding semantic information, the most ba-
sic of which is bsem, standing for base seman-
tic content, or rather, the semantic contribution a
lexical entry may make on its own to the whole.
For example, “woman” might be said to contribute
λx.female(x)∧human(x), while “female”, only
λx. female(x). Normally bsem should be lexi-
calized.
In addition, we postulate feature sem for hold-
ing the actual semantic content of nodes, which
should not be lexicalized, but rather calculated
by a principle imposing semantic composition-
ality. In our rather straightforward formulation,
for every node v, sem(v) is but the conjunction
of bsem(v) and the sems of all its PA daughters
thus:
sem(v)
=
bsem(v)∧logicalandtext{sem(u) : v −→PA u},
(2)
where v lscript−→D u denotes that node u is a daughter of
v on dimension D through an edge labelled lscript (the
absence of the label just denotes that it does not
matter).
Finally, completeness is imposed by means of
node feature axiom, upon which holds the invari-
ant
sem(v) ⇒ axiom(v), (3)
for every node v. The idea is to have axiom as
a lexicalized feature and consistently assign it the
neutralizing constant true for all lexical entries
but those meant for the root node, in which case
the value should equal the intended semantic con-
tent.
Coreference classes, concentrators and revi-
sions to dimensions PA and DS. The unavoid-
45
able impediment to propagation is intrinsic choice,
i.e. that between things equivalent and that we
wish to remain so. That is exactly what we would
like to capture for lexicalization while attempting
to make the greatest amount of determinacy avail-
able to minimize failure. To this end, our strategy
is to make PA analyses as flat as possible, with
coreferent nodes — i.e. having the same ref-
erent or hook — organizing in plexuses around,
or rather, directly below hopefully one single node
per plexus, thus said to be a concentrator. This
offers advantages such as the following:
1. the number of leaf nodes is maximized,
whose sem features are determinate and
equals their respective bsems;
2. coreferent nodes tend to be either potential
sisters below a concentrator or deleted. This
allows most constraints to be stated in terms
of direct relationships of mother-, daughter-
or sisterhood. Such proximity and concen-
tration is rather opportune because we are
dealing simply with potential relationships as
nodes will usually be deleted. In other words,
our constraints aim mostly at ruling out un-
desired relations rather than establishing cor-
rect ones. The latter must remain a matter of
choice.
It is in order to define which are the best candi-
dates for concentrators. Having different concen-
trators in equivalent alternative realizations, such
as “a ballerina”, “a female dancer” or “a danc-
ing woman” (hypothetical concentrators are un-
derlined), would be rather hampering, since the
task of assigning “concentratingness” would then
be fatally bound to lexicalization disjunction it-
self and not much determinacy could possibly be
derived ahead of committing to this or that re-
alization. In face of that, the natural candidate
must be something that remains constant all along,
namely the article. Certainly, what specific arti-
cle and, among others, whether to generate a defi-
nite/anaphoric or indefinite/first-time referring ex-
pression is also a matter of choice, but not pertain-
ing to lexicalization. For the sake of simplicity and
scope, let us stick to the case of indefinite articles,
keeping in mind that possible extensions to our
model to cope with (especially definite anaphoric)
referring expression generation shall certainly re-
quire some revisions.
DS
a dancing female person ∗
npd
modd modd
root
PA
a dancing female person ∗
apply apply apply
root
Figure 4: new PA and DS analyses for “a dancing
female person”. An asterisk stands for the root
node
Electing articles for concentrators means that
they now directly dominate their respective nouns
and accompanying modifiers on dimension PA as
shown in Figure 4 for “a dancing female person”.
One new edge label apply is postulated to connect
concentrators with their complements, the follow-
ing invariants holding:
1. for every node v, hook(v) =
holes(v)(apply), i.e. only coreferent
nodes are linked by apply edges;
2. every concentrator lexical entry provides a
valency allowing any number of outgoing
apply edges, though requiring at least one.
Roughly speaking, the intuition behind this new
PA design is that the occurrence of a lexical (as
opposed to grammatical) word corresponds to the
evaluation of a lambda expression, resulting in
a fresh unary predicate built from the basesem
of the word/node and the sems of its children.
In turn, every apply edge denotes the applica-
tion of one such predicate to the variable/referent
of a concentrator. In fact, even verbs might be
treated analogously if Infl constituents were mod-
elled, constituting the concentrators of verb base
forms. Also useful is the intuition that PA abstracts
over most morphosyntactic oppositions, such as
that between nouns and adjectives, which figure
as equals there. The subordination of the latter
word class to the former becomes a strictly syntac-
tic phenomenon or, in any case, other dimensions’
affairs.
Dimension DS is all about such oppositions,
however, and should remain much the same ex-
cept that the design is rather simplified if DS main-
tains concentrator dominance. As a result, arti-
cles must stand as heads of noun — or rather, de-
46
Figure 5: Starting conditions with perfectly float-
ing nodes in the lexicalization of “a ballerina” and
its paraphrases
terminer — phrases, which is not an unheard-of
approach, just unprecedented in XDG. Naturally,
standard syntactic structures should appear below
determiners, as exemplified in Figure 4. Granted
this, the flatness of PA and its relation to DS can
straightforwardly be accomplished by the applica-
tion of XDK library principles Climbing, whereby
PA is constrained to be a flattening of DS, and Bar-
riers, whereby concentrators are set as obstacles to
climbing by means of special lexical features. Fig-
ure 5 thus illustrates the starting conditions for the
lexicalization of “a ballerina” and its paraphrases,
including the bsems of nodes. Notice that we have
created distinct nodes for different parts of speech
of one same word, “female”. The relevance of this
measure shall be clarified along this section as we
develop this example.
Fighting over-redundancy. We currently em-
ploy two constraints to avoid over-redundancy.
The first is complete in that its declarative seman-
tics already sums up all we desire to express in that
matter, while the other is redundant, incomplete,
but supplied to improve propagation.
The complete constraint is imposed between
every node and each of its potential daughters.
Apart from overhead reasons, it might as well be
imposed between every pair of nodes. However,
the set of potential daughters of a node v is best
approximated by function dcands thus:
dcands(v)
=
(uniontext{〈x〉 : x ∈ ran(holes(v))})−{v},
where 〈x〉 denotes the coreference class of vari-
able x; and ran(f), the range of function f. It is
worth noticing that in generation dcands is known
at model creation.
Given a node u and a potential daughter v ∈
dcands(u), this constraint involves hypothesiz-
ing what the actual semantic content of u would
be like if v were not among its daughters.
Let hdsv(u) and hsemv(u) be respectively the
hypothetical set of daughters of u counting v out
and its “actual” semantic content in that case,
which can be defined thus:
hdsv(u) = {w : u−→PA w}−{v}
and
hsemv(u) = bsem(u)∧
∧logicalandtext{sem(w) : w ∈ hdsv(u)}. (4)
The constraint consists of ensuring that, if the ac-
tual semantic content of the potential daughter v
would be subsumed by the hypothetical semantic
content of u, then v can never be a daughter of u.
In other words, each daughter of u must make a
difference. Formally, we have the following:
(hsemv(u) ⇒ sem(v)) →¬(u−→PA v) (5)
where the two implication symbols, ⇒ and →
have the same interpretation in this logic state-
ment, but are nonetheless distinguished because
their implementations are radically different as
shall be discussed in Section 4. Constraint (5)
is especially active after some choices have been
made. Suppose, in our “a ballerina” example,
that “dancing” is the only word selected so far
for lexicalization. Let u and v be respectively
the nodes for “a” and “dancing”. In this case,
the consequent in (5) is false and so must be the
antecedent hsemv(u) ⇒ dance(x), which im-
plies that hsemv(u) can never “contain” the literal
dance(x). From (4) and the fact that articles have
neutral base semantics — i.e. bsem(u) = true —
it follows that all further daughters of u must not
imply dance(x). As that does not hold for “bal-
lerina” and “dancer”, these nodes are ruled out as
daughters of u and thus deleted for lack of moth-
ers. Conversely, if “ballerina” had been selected
47
in the first place, (5) would trivially detect the re-
dundancy of all other words and analogously en-
tail their deletion.
In turn, the redundant constraint ensures that,
for every pair of coreferent nodes u and v ∈
〈upvar(u)〉, if the actual semantic content of v
is subsumed by u, then they can never be sisters.
Formally:
(sem(u) ⇒ sem(v)) →
(v /∈ sistersPA(u)). (6)
This constraint is remarkable for being active
even in the absence of choice since it is established
between potential sisters, which usually have their
sems sufficiently, if not completely, determined.
Surprisingly enough, the main effect of (6) is on
syntax, by constraining alliances on DS. As our
new version of the XDK’s Climbing principle is
now aware of sisterhood constraints, it will con-
strain every node on PA to have as a mother on
DS either its current PA mother or some node be-
longing to one of its PA sister trees3. In ground
terms, when (6) detects that e.g. “woman” sub-
sumes “female (adj./n.)” and constrains them not
to be sisters on PA, the Climbing principle will
rule out “woman” as a potential DS mother of
“female (adj.)”. It is worth mentioning that once
v /∈ sistersD(u) is imposed, our sisterhood con-
straints entail u /∈ sistersD(v).
Redundant compositionality constraints. Al-
though a complete statement of semantic compo-
sitionality is given by Equation 2, we introduce
two redundant constraints to improve propagation.
The first of them attempts to advance detection of
nodes whose semantic contribution is strictly re-
quired even before the sem features of their moth-
ers become sufficiently constrained. It does so
by means of an strategy analogous to that of (5),
namely by hypothesizing, for every node v, what
the total semantic content would be like if v were
deleted. Let root, hdownv(u) and htotsemv be
respectively the root node, the set of nodes directly
or indirectly below u counting v out, and the to-
tal semantic content supposing v is deleted, which
can be defined thus:
hdownv(u) = downPA(u)−{v}
and
htotsemv =
logicalanddisplay
{sem(u) : u ∈ hdownv(root)}.
3If the subgraphs option is active, which is the case here.
The constraint can be formally expressed thus:
deleted(v) → (htotsemv ⇒ sem(v)). (7)
Unfortunately, (7) is not of much use in our current
example, better applying to cases where there are a
greater number of alternative inner nodes. For ex-
ample, in the lexicalization of (1), this constraint
was immediately able to infer that “lovely” must
not be deleted since it was the sole node contribut-
ing lovely(y).
The second redundant compositionality con-
straint attempts to advance detection of nodes not
counting on enough potential sisters to fulfill the
actual semantic content of their mothers. To this
end, for every node v, the following constraint is
imposed:
logicalandtext{sem(u) : u−→
PA v}
=

logicalandtext{bsem(u) : u−→
PA v}
∧logicalandtext
{sem(u) : u ∈ eqsisPA(v)}

,
(8)
where
eqsisD(v) =
braceleftbigg ∅, iff v is deleted on D
sistersD(v)∪{v}, else. (9)
which reads “the actual semantic content of the
mothers of a node is equal to their base seman-
tic content in conjunction with the actual se-
mantic content of this node and its sisters”. It
is worth noticing that, when v is deleted, both
{u : u−→PA v} and eqsisPA(v) become empty so
that (8) still holds. This constraint is especially
interesting because our new versions of principles
Climbing and Barriers, which hold between DS
and PA, propagate sisters constraints in both di-
rections. In association with (6) and (8), these
principles promote an interesting interplay be-
tween syntax and semantics. Resuming our ex-
ample, let v be node “female (n.)”. Before any
selection is performed, constraint (6) infers that
only “dancing”, “person” and “dancer” can be sis-
ters to v on PA and thus (now due to Climbing)
daughters to v on DS. They cannot be mothers
to v because its valency on DS and Climbing are
enough to establish that, if v has any mother at
all on DS, it is “a”. Again taking the DS valency
of v into account, it is possible to infer that, if v
has any daughter at all on DS, it is “dancing”, i.e.
the only adjective in the original set of candidate
48
daughters. It is the new sisterhood-aware version
of Barriers that propagates this new piece of in-
formation back to PA. This principle now knows
that the sisters of v on PA must come from ei-
ther (i) the tree below v on DS, (ii) one of its DS
sister trees or (iii) some DS tree whose root be-
longs to eqsisDS(inter) for some node inter ap-
pearing — on DS — between v and one of its
mothers on PA. In our example, (ii) and (iii) are
known to be empty sets, while (i) is at most “danc-
ing”. Consequently, “dancing” is the only poten-
tial PA sister of v. Now (8) is finally able to con-
tribute. As “a” is the only possible DS mother
of v and any article has empty basic semantics,
one is entitled to equate logicalandtext{sem(u) : u−→PA v} tologicalandtext
{sem(u) : u ∈ eqsisPA(v)}. Even though it is
not known whether v will ever have mothers or
daughters, (8) knows that the left-hand side of the
equation yields either the whole intended seman-
tics or nothing, while the right-hand side yields ei-
ther nothing or at most dance(x) ∧ female(x).
Therefore, the only solution to the equation is
nothing on both sides, implying that eqsisPA(v)
is empty and thus v is deleted by definition (9).
Such strong interplay is only possible because
we have created distinct nodes for the different
parts of speech — or rather, the two different DS
valencies — of “female”. With somewhat more
complicated, heavier constraints it would be pos-
sible to have the same propagation for one sin-
gle node selecting from different parts of speech.
Notwithstanding, that does not seem worth the ef-
fort because a model creation algorithm would be
perfectly able to detect the diverging DS valencies,
create as many nodes as needed and distribute the
right lexical entries among them.
4 Implementation and Performance
Remarks
The ideas presented in Section 3 were fully im-
plemented in a development branch of the XDK.
As with the original XDK, all development is
based on the multiparadigm programming system
Mozart4.
The implementation closely follows the original
CP approach of the XDK and strongly reflects the
constraints we have presented after some rather
standard transformations to CP, namely:
• variable identifiers in hooks and holes, as
well as all semantic input literals such as
4http://www.mozart-oz.org
human(x) and tap(e,x,y), are encoded as
integer values. Features bsem/sem are im-
plemented as set constants/variables of such
integers;
• logic conjunction ∧ is thus modelled by set
union ∪. Each “big” conjunction is re-
duced to the form logicalandtext{f(v) : v ∈ V}, where
V is a set variable of integer-encoded node
identifiers, and modelled by a union selec-
tion constraint uniontext〈f(1) f(2) ...f(M)〉[V],
where M is the maximum node identifier and
which constrains its result — a set variable
— to be the union of f(v) for all v ∈ V ;
• implications of the form x ⇒ y are imple-
mented as y ⊆ x, while those of the form
x → y as reify(x) ≤ reify(y), where
the result of reify(x) is an integer-encoded
boolean variable constrained to coincide with
the truth-value of expression x.
Our branch of the XDK now counts on two new
principles, namely (i) Delete, which requires the
Graph principle, creates doubles for the node at-
tributes introduced by the latter, providing the il-
lusion of deletion, and introduces features for sis-
terhood constraints; and (ii) Compsem, imposing
all constraints described in Section 3.
A few preliminary proof-of-concept experi-
ments were carried out with input similar to (1)
and linguistically and combinatorially analogous
to our “ballerina” example. In all of them, the sys-
tem was able to generate all paraphrases with no
failed state (backtracking) in search, which means
that propagation was maximal for all cases. Al-
though our design supports more complex linguis-
tic constructs such as relative clauses and prepo-
sition phrases and is expected to behave similarly
for those cases, we have not made any such exper-
iments so far. This is so because we are currently
prioritizing the issue of model creation and cover-
age of other generation tasks.
5 Conclusions and Future Work
In this paper we have presented the results of the
very first steps towards the application of XDG to
Natural Language Generation, hopefully in an in-
tegrated architecture. Our main contribution and
focus was a formulation of lexicalization disjunc-
tion in XDG terms, preserving the good properties
of modularity and extensibility while achieving
49
good propagation. We also hope to have demon-
strated how strong the interplay between linguis-
tic dimensions can be in XDG. As basic issues
as the very nature of input were discussed also as
an evidence that there is still a long way to go.
We are currently working on extending our design
to cover other generation tasks than lexicalization
and perform model creation.
Acknowledgements
We would like to thank Claire Gardent and Denys
Duchier, for all their invaluable insights and com-
ments, and group Langue & Dialogue and Brazil-
ian agencies CNPq and CAPES, for funding this
research project and travel to COLING/ACL.

References
Copestake, A. A., Lascarides, A., and Flickinger, D.
(2001). An algebra for semantic construction in
constraint-based grammars. In Meeting of the As-
sociation for Computational Linguistics, pages 132–
139.
Debusmann, R. (2004). Multiword expressions as de-
pendency subgraphs. In Proceedings of the ACL
2004 Workshop on Multiword Expressions: Integrat-
ing Processing, Barcelona/ESP.
Debusmann, R., Duchier, D., Koller, A., Kuhlmann,
M., Smolka, G., and Thater, S. (2004a). A rela-
tional syntax-semantics interface based on depen-
dency grammar. In Proceedings of the COLING
2004 Conference, Geneva/SUI.
Debusmann, R., Duchier, D., and Kruijff, G.-J. M.
(2004b). Extensible dependency grammar: A new
methodology. In Proceedings of the COLING
2004 Workshop on Recent Advances in Dependency
Grammar, Geneva/SUI.
Debusmann, R., Duchier, D., and Niehren, J. (2004c).
The xdg grammar development kit. In Proceedings
of the MOZ04 Conference, volume 3389 of Lec-
ture Notes in Computer Science, pages 190–201,
Charleroi/BEL. Springer.
Debusmann, R., Duchier, D., and Rossberg, A. (2005).
Modular Grammar Design with Typed Parametric
Principles. In Proceedings of FG-MOL 2005, Ed-
inburgh/UK.
Debusmann, R. and Smolka, G. (2006). Multi-
dimensional dependency grammar as multigraph de-
scription. In Proceedings of FLAIRS-19, Melbourne
Beach/US. AAAI.
Reiter, E. and Dale, R. (2000). Building Natural Lan-
guage Generation Systems. Cambridge University
Press.
Van Roy, P. and Haridi, S. (2004). Concepts, Tech-
niques, and Models of Computer Programming.
MIT Press.
