Proceedings of the 8th International Workshop on Tree Adjoining Grammar and Related Formalisms, pages 147–152,
Sydney, July 2006. c©2006 Association for Computational Linguistics
Modeling and Analysis of Elliptic Coordination by Dynamic Exploitation
of Derivation Forests in LTAG parsing
Djamé Seddah (1) & Benoît Sagot (2)
(1) NCLT - Dublin City University - Ireland
djame.seddah@computing.dcu.ie
(2) Projet ATOLL - INRIA - France
benoit.sagot@inria.fr
Abstract
In this paper, we introduce a generic ap-
proach to elliptic coordination modeling
through the parsing of Ltag grammars. We
show that erased lexical items can be re-
placed during parsing by informations ga-
thered in the other member of the coordi-
nate structure and used as a guide at the
derivation level. Moreover, we show how
this approach can be indeed implemented
as a light extension of the LTAG formalism
throuh a so-called “fusion” operation and
by the use of tree schemata during parsing
in order to obtain a dependency graph.
1 Introduction
The main goal of this research is to provide a
way of solving elliptic coordination through the
use of Derivation Forests. The use of this de-
vice implies that the resolution mechanism de-
pends on syntactic information, therefore we will
not deal with anaphoric resolutions and scope mo-
difier problems. We show how to generate a de-
rivation forest described by a set of context free
rules (similar to (Vijay-Shanker and Weir, 1993))
augmented by a stack of current adjunctions when
a rule describes a spine traversal. We first briefly
discuss the linguistic motivations behind the reso-
lution mechanism we propose, then introduce the
fusion operation and show how it can be compa-
red to the analysis of (Dalrymple et al., 1991) and
(Steedman, 1990) and we show how it differs from
(Sarkar and Joshi, 1996). We assume that the rea-
der is familiar with the Lexicalized Tree Adjoining
Grammars formalism ((Joshi and Schabes, 1992)).
2 Linguistic Motivations : a parallelism
of Derivation
The LTAG formalism provides a derivation tree
which is strictly the history of the operations nee-
ded to build a constituent structure, the derived
tree. In order to be fully appropriate for seman-
tic inference 1, the derivation tree should display
every syntactico-semantic argument and therefore
should be a graph. However to obtain this kind
of dependency structure when it is not possible to
rely on lexical information, as opposed to (Seddah
and Gaiffe, 2005a), is significantly more compli-
cated. An example of this is provided by elliptic
coordination.
Consider the sentences Figure 3. They all can be
analyzed as coordinations of S categories2 with
one side lacking one mandatory argument. In (4),
one could argue for VP coordination, because the
two predicates share the same continuum (same
subcategorization frame and semantic space). Ho-
wever the S hypothesis is more generalizable and
supports more easily the analysis of coordination
of unlike categories (“John is a republican and
proud of it” becomes “Johni isj a republican and
εi εj proud of it”).
The main difficulty is to separate the cases when
a true co-indexation occurs ((2) and (4)) from the
cases of a partial duplication (in (1), the predicate
is not shared and its feature structures could dif-
fer on aspects, tense or number3). In an elliptic
construction, some words are unrealized. There-
fore, their associated syntactic structures are also
non-realized, at least to some extent. However, our
aim is to get, as a result of the parsing process,
the full constituency and dependency structures of
the sentence, including erased semantic items (or
units) and their (empty) syntactic positions. Since
their syntactic realizations have been erased, the
construction of the dependency structure can not
1As elementary trees are lexicalized and must have a mi-
nimal semantic meaning (Abeillé, 1991), the derivation tree
can be seen as a dependency tree with respect to the restric-
tions defined by (Rambow and Joshi, 1994) and (Candito and
Kahane, 1998) to cite a few.
2P for Phrase in french, in Figures given in annex
3see “John lovesi Mary and childreni their gameboy”
147
be anchored to lexical items. Instead, it has to be
anchored on non-realized lexical items and gui-
ded by the dependency structure of the reference
phrase. Indeed, it is because of the parallelism bet-
ween the reference phrase and the elliptical phrase
that an ellipsis can be interpreted.
3 The Fusion Operation
In this research, we assume that every coordina-
tor, which occurs in elided sentences, anchors an
initial tree αconj rooted by P and with two sub-
stitution nodes of category P (Figure 1). The fu-
Pαconj
PαconjG↓ et PαconjD↓
FIG. 1 – Initial Tree αconj
sion operation replaces the missing derivation of
any side of the coordinator by the corresponding
ones from the other side. It shall be noted that the
fusion provide proper node sharing when it is syn-
tactically decidable (cf. 6.4). The implementation
relies on the use of non lexicalized trees (ie tree
schemes) called ghost trees. Their purpose is to
be the support for partial derivations which will
be used to rebuild the derivation walk in the eli-
ded part. We call the partial derivations ghost deri-
vations. The incomplete derivations from the tree
γ are shown as a broken tree in Figure 2. The
ghost derivations are induced by the inclusion of
the ghost tree α′ which must be the scheme of the
tree α. When the two derivation structures from
γ and α′ are processed by the fusion operation, a
complete derivation structure is obtained.
αconj
α γ
0000000
0000000
0000000
0000000
0000000
0000000
0000000
0000000
1111111
1111111
1111111
1111111
1111111
1111111
1111111
1111111 γ
α
αconj
α’
00000
00000
00000
11111
11111
11111
0000000
0000000
0000000
0000000
0000000
0000000
0000000
0000000
1111111
1111111
1111111
1111111
1111111
1111111
1111111
1111111
Derivations before the Fusion After the Fusion
FIG. 2 – Derivation sketch of the Fusion Operation
4 examples anylysis
Let us go back to the following sentences :
(1) Jean aimei Marie et Paul εi Virginie
John loves Mary and Paul Virginia
(2) Pauli aime Virginie et εi déteste Marie
Paul loves Virginia and hates Mary
Obviously (1) can have as a logical formula :
aime′(jean′,Marie′)∧aime′(paul′,virginie′)
whereas (2) is rewritten by eat(paul′,apple′) ∧
buy′(Paul′,cherries′). The question is to diffe-
rentiate the two occurrence of aime′ in (1) from
the paul′ ones. Of course, the second should be
noted as a sharing of the same argument when the
first is a copy of the predicate aime′. Therefore
in order to represent the sharing, we will use the
same node in the dependency graph while a ghos-
ted node (noted by ghost(γ) in our figures) will be
used in the other case. This leads to the analysis
figure 4. The level of what exactly should be co-
pied, speaking of level of information, is outside
the scope of this paper, but our intuition is that
a state between a pure anchored tree and an tree
schemata is probably the correct answer. As we
said, aspect, tense and in most case diathesis for 4
are shared, as it is showed by the following sen-
tences :
(3)*Paul killed John and Bill by Rodger
(4)*Paul ate apple and Mary will pears
As opposed to (4), we believe “Paul ate apples
and Mary will do pears” to be correct but in
this case, we do not strictly have an ellipsis but
a semi-modal verb which is susbsumed by its
co-referent. Although our proposition focuses on
syntax-semantic interface, mainly missing syntac-
tic arguments.
5 Ghost Trees and Logical Abstractions
Looking either at the approach proposed by
(Dalrymple et al., 1991) or (Steedman, 1990) for
the treatment of sentences with gaps, we note that
in both cases5 one wants to abstract the realized
element in one side of the coordination in order to
instantiate it in the other conjunct using the coor-
dinator as the pivot of this process. In our analy-
sis, this is exactly the role of ghost trees to support
such abstraction (talking either about High Order
Variable or λ-abstraction). In this regard, the fu-
sion operation has only to check that the deriva-
tions induced by the ghost tree superimpose well
with the derivations of the realized side.
This is where our approach differs strongly from
(Sarkar and Joshi, 1996). Using the fusion opera-
tion involves inserting partial derivations, which
are linked to already existing ones (the realized
derivation), into the shared forest whereas using
4w.r.t to the examples of (Dalrymple et al., 1991), i.e “It
is possible that this result can be derived (..) but I know of no
theory that does so.”
5Footnote n˚3, page 5 for (Dalrymple et al., 1991), and
pages 41-42 for (Steedman, 1990).
148
the conjoin operation defined in (Sarkar and Joshi,
1996) involves merging nodes from different trees
while the tree anchored by a coordinator acts si-
milarly to an auxiliary tree with two foot nodes.
This may cause difficulties to derive the now dag
into a linear string. In our approach, we use empty
lexical items in order to leave traces in the deriva-
tion forest and to have syntacticly motivated deri-
ved tree (cf fig. 5) if we extract only the regular
LTAG “derivation item” from the forest.
6 LTAG implementation
6.1 Working on shared forest
A shared forest is a structure which combines
all the information coming from derivation trees
and from derived trees. Following (Vijay-Shanker
and Weir, 1993; Lang, 1991), each tree anchored
by the elements of the input sentence is described
by a set of rewriting rules. We use the fact that
each rule which validates a derivation can infer
a derivation item and has access to the whole
chart in order to prepare the inference process.
The goal is to use the shared forest as a guide for
synchronizing the derivation structures from both
parts of the coordinator.
This forest is represented by a context free
grammar augmented by a stack containing the
current adjunctions (Seddah and Gaiffe, 2005a),
which looks like a Linear Indexed Grammar (Aho,
1968).
Each part of a rule corresponds to an
item à la Cock Kasami Younger described
by (Shieber et al., 1995), whose form is
< N,POS,I,J,STACK > with N a node
of an elementary tree, POS the situation relative
to an adjunction (marked ⊤ if an adjunction is
still possible, ⊥ otherwise). This is marked on
figure 5 with a bold dot in high position, ⊤, or a
bold dot in low position, ⊥). I and J are the start
and end indices of the string dominated by the N
node. STACK is the stack containing all the call
of the subtrees which has started an adjunction et
which must be recognized by the foot recognition
rules. We used S as the starting symbol of the
grammar and n is the length of the initial string.
Only the rules which prove a derivation are shown
in figure 6.
The form of a derivation item is
Name :< Nodeγto,γfrom,γto,Type,γghost >
where Name is the derivation, typed Type6, of
the tree γfrom to the node Node of γto.7
6.2 Overview of the process
We refer to a ghost derivation as any derivation
which occurs in a tree anchored by an empty
element, and ghost tree as a tree anchored by
this empty element. As we can see in figure 5,
we assume that the proper ghost tree has been
selected. So the problem remains to know which
structure we have to use in order to synchronize
our derivation process.
Elliptic substitution of an initial ghost tree
on a tree αconj : Given a tree αconj (see Fig.
1) anchored by a coordinator and an initial tree
α1 of root P to be substituted in the leftmost P
node of αconj. Then the rule corresponding to
the traversal of the Leftmost P node would be
PαconjG(⊤,i,j,−,−) −→ Pα1(⊤,i,j,−,−) .
So if this rule is validated, then we infer a deriva-
tion item called D1 :<PαconjG,α1,αconj,subst,-> .
Now, let us assume that the node situated to the
right of the coordinating conjunction dominates a
phrase whose verb has been erased (as in et Paul _
Virginie) and that there exists a tree of Root P with
two argument positions (a quasi tree like N0VN1
in LTAG literature for example). This ghost tree
is anchored by an empty element and is called
αghost. We have a rule, called Call-subst-ghost,
describing the traversal of this node :
PαconjD(⊤,j+1,n,-,-) −→ Pαghost(⊤,j+1,n,-,-) .
For the sake of readability, let us call D1′ the
pseudo-derivation of call-subst-ghost :
D1′ :< PαconjD, ? ,αconj,subst,αghost > ,
where the non-instantiated variable, ? , indicates
the missing information in the synchronized tree.
If our hypothesis is correct, this tree will be ancho-
red by the anchor of α1. So we have to prepare this
anchoring by performing a synchronization with
existing derivations. This leads us to infer a ghost
substitution derivation of the tree α1 on the node
PαconjD. The inference rule which produces the
6which can be an adjunction (type = adj), a substitu-
tion (subst), an axiom (ax), an anchor which is usually an
implicit derivation in an LTAG derivation tree (anch) or a
“ghosted” one (adjg,substg,anchg)
7γghost is here to store the name of the ‘ghost tree’ if the
Node belongs to one or − otherwise.
149
item called ghost(α1) on Figure 5, is therefore :
D1′ :< PαconjD, ? ,αconj,subst,αghost >
D1 :< PαconjR,α1,αconj,subst,− >
Ghost−D1 :< PαconjR,α1,αconj,substg,αghost >
The process which is almost the same for the
remaining derivations, is described section 6.4.
6.3 Ghost derivation and Item retrieving
In the last section we have described a ghost
derivation as a derivation which deals with a tree
anchored by an empty element, either it is the
source tree or the destination tree. In fact we need
to keep marks on the shared forest between what
we are really traversing during the parsing process
and what we are synchronizing, that is why we
need to have access to all the needed informations.
But the only rule which really knows which tree
will be either co-indexed or duplicated is the rule
describing the substitution of the realized tree.
So, we have to get this information by accessing
the corresponding derivation item. If we are in a
two phase generation process of a shared forest8
we can generate simultaneously the substitution
rules for the leftmost and rightmost nodes of the
tree anchored by a coordination and then we can
easily get the right synchronized derivation from
the start. Here we have to fetch from the chart this
item using unification variables through the path
of the derivations leading to it.
Let us call “climbing” the process of going
from a leaf node N of a tree γ to the node
belonging to the tree anchored by a coordi-
nator (αconj) and which dominates this node.
This “climbing” gives us a list of linked deri-
vations (ie. [< γx(N),γy,γx,Type,IsGhost >
,< γz(N),γx,γz,Type1,IsGhost1 >,..] where
γ(N) is the node of the tree γ where the derivation
takes place9). The last returned item is the one who
has an exact counterpart in the other conjunct, and
which is easy to recover as shown by the inference
rule in the previous section. Given this item, we
start the opposite process, called “descent”, which
use the available data gathered by the climbing
(the derivation starting nodes, the argumental po-
sition marked by an index on nodes in TAG gram-
8The first phase is the generation of the set of rules,
(Vijay-Shanker and Weir, 1993), and the second one is the fo-
rest traversal (Lang, 1992). See (Seddah and Gaiffe, 2005b)
for a way to generate a shared derivation forest where each
derivation rule infers its own derivation item, directly prepa-
red during the generation phase.
9The form of a derivation item is defined section 6.1
mars..) to follow a parallel path. Our algorithm can
be considered as taking the two resulting lists as a
parameter to produce the correct derivation item.
If we apply a two step generation process (shared
forest generation then extraction), the “descent”
and the “climbing” phase can be done in parallel
in the same time efficient way than(2005a).
6.4 Description of inference rules
In this section we will describe all of the infe-
rences relative to the derivation in the right part,
resp. left, of the coordination, seen in figure 5.
In the remainder of this paper, we describe the
inference rules involved in so called predicative
derivations (substitutions and ghost substitutions).
Indeed, the status of adjunction is ambiguous. In
the general case, when an adjunct is present on one
side only of the conjunct, there are two possible
readings : one reading with an erased (co-indexed)
modifier on the other side, and one reading with no
such modifier at all on this other side. In the rea-
ding with erasing, there is an additionnal question,
which occurs in the substitution case as well : in
the derivation structure, shall we co-index the era-
sed node with its reference node, or shall we per-
form a (partial) copy, hence creating two (partially
co-indexed) nodes ? The answer to this question
is non-trivial, and an appropriate heuristics is nee-
ded. A first guess could be the following : any fully
erased node (which spans an empty range) is fully
co-indexed, any partially erased node is copied
(with partial co-indexation). In particular, erased
verbs are always copied, since they can not occur
without non-erased arguments (or modifiers).
Elliptic substitution of an initial tree α on a
ghost tree γghost : If a tree α substituted in
a node Ni of a ghost tree γghost (ie. Derivation
g-Der2’ on figure 5), where i is the traditional
index of an argumental position (N0,N1...) of this
tree ; and if there exists a ghost derivation of a
substitution of the tree γghost into a coordination
tree αconj (Der. g-Der1) and therefore if this
ghost derivation pertains to a tree αX where
a substitution derivation exists node Ni,(Der.
Der2) then we infer a ghost derivation indicating
the substitution of α on the forwarded tree αX
through the node Ni of the ghost tree γghost (Der.
Ghost-Der2).
150
g-Der2’:< Niα,α, ? ,substg,γghost >
g-Der1:< PαconjD,αX,αconj,substg,γghost
Der2:< NiαX ,−,αX,subst,− >
ghost-Der2:< Niα,α,ghost(αX),substg,γghost >
This is the mechanism seen in the analysis of
“Jean aime Marie et Pierre Virginie” to provide the
derivation tree.
Elliptic substitution of a initial ghost tree αghost
on a tree γ substituted on an tree αconj : We
are here on a kind of opposite situation, we have
a realized subtree which lacks one of its argument
such as Jeani dormit puis ǫi mourut (Johni slept
then ǫi died). So we have to first let a mark in the
shared forest, then fetch the tree substituted on
the left part of the coordination, and get the tree
which has substituted on its ith node, then we will
be able to infer the proper substitution. We want
to create a real link, because as opposed to the last
case, it’s really a link, so the resulting structure
would be a graph with two links out of the tree
anchored by Jean, one to [dormir] (to sleep) and
one to [mourir] (to die).
If a ghost tree αghost substituted on a node Ni
of a tree α (Der. g-Der1’), if this tree α has been
substituted on a substitution node,PconjD, in the
rightmost part of a tree αconj, (Der. Der1) ancho-
red by a coordinating conjunction, if the leftmost
part node, PconjL, of αconj received a substitu-
tion of a tree αs, (Der. Der2) and if this tree has
a substitution of a tree αfinal on its ith node, (Der.
Der3) then we infer an item indicating a derivation
between the tree αfinal and the tree α on its node
Ni, (Der. g-Der1)10.
g-Der1’:< Niαghost, ? ,α,substg,αghost >
Der1:< PαconjD,α,αconj,subst,− >
Der2:< PαconjL,αs,αconj,subst,− >
Der3:< Niαs,αfinal,αs,subst,− >
g-Der1:< Niα,αfinal,α,subst,αghost >
7 Conclusion
We presented a general framework to model and
to analyze elliptic constructions using simple me-
chanisms namely partial sharing and partial dupli-
cation through the use of a shared derivation fo-
rest in the LTAG framework. The main drawback
of this approach is the use of tree schemata as part
of parsing process because the anchoring process
10This mechanism without any restriction in the general
case, can lead to a exponential complexity w.r.t to the length
of the sentence.
must have a extremely good precision choose al-
gorithm when selecting the relevant trees. For the
best of our knowledge it is one of the first time that
merging tree schemata, shared forest walking and
graph induction, i.e., working with three different
levels of abstraction, is proposed. The mechanism
we presented is powerful enough to model much
more than the ellipsis of verbal heads and/or some
of their arguments. To model elliptic coordinations
for a given langage, the introduction of a specific
saturation feature may be needed to prevent over-
generation (as we presented in (Seddah and Sagot,
2006)). But the same mechanism can be used to go
beyond standard elliptic coordinations. Indeed, the
use of strongly structured anchors (e.g., with a dis-
tinction between the morphological lemma and the
lexeme) could allow a fine-grained specification of
partial value sharing phenomena (e.g. zeugmas).
Apart from an actual large scale implementation
of our approach (both in grammars and parsers),
future work includes applying the technique des-
cribed here to such more complex phenomena.

References
Anne Abeillé. 1991. Une grammaire lexicalisée
d’arbres adjoints pour le français. Ph.D. thesis, Pa-
ris 7.
Alfred V. Aho. 1968. Indexed grammars-an extension
of context-free grammars. J. ACM, 15(4) :647–671.
Marie-Hél’ene Candito and Sylvain Kahane. 1998.
Can the TAG derivation tree represent a semantic
graph ? In Proceedings TAG+4, Philadelphie, pages
21–24.
Mary Dalrymple, Stuart M. Shieber, and Fernando
C. N. Pereira. 1991. Ellipsis and higher-order unifi-
cation. Linguistics and Philosophy, 14(4) :399–452.
Aravind K. Joshi and Yves Schabes. 1992. Tree Adjoi-
ning Grammars and lexicalized grammars. In Mau-
rice Nivat and Andreas Podelski, editors, Tree auto-
mata and languages. Elsevier Science.
Bernard Lang. 1991. Towards a Uniform Formal Fra-
mework for Parsing. In M. Tomita, editor, Current
Issues in Parsing Technology. Kluwer Academic Pu-
blishers.
Bernard Lang. 1992. Recognition can be harder than
parsing. In Proceeding of the Second TAG Work-
shop.
Owen Rambow and Aravind K. Joshi. 1994. A Formal
Look at Dependency Grammar and Phrase Structure
Grammars, with Special consideration of Word Or-
der Phenomena. Leo Wanner, Pinter London, 94.
Anoop Sarkar and Aravind Joshi. 1996. Coordination
in tree adjoining grammars : Formalization and im-
plementation. In COLING’96, Copenhagen, pages
610–615.
Djamé Seddah and Bertrand Gaiffe. 2005a. How to
build argumental graphs using TAG shared forest :
a view from control verbs problematic. In Proc.
of the 5th International Conference on the Logical
Aspect of Computional Linguistic - LACL’05, Bor-
deaux, France, Apr.
Djamé Seddah and Bertrand Gaiffe. 2005b. Using both
derivation tree and derived tree to get dependency
graph in derivation forest. In Proc. of the 6th In-
ternational Workshop on Computational Semantics
- IWCS-6, Tilburg, The Netherlands, Jan.
Djamé Seddah and Benoît Sagot. 2006. Modélisation
et analyse des coordinations elliptiques via l’exploi-
tation dynamique des forêts de dérivation. In Proc.
of Traitement automatique des Langues Naturelle -
TALN 06 - louveau, Belgium, Apr.
Stuart Shieber, Yves Schabes, and Fernando Pereira.
1995. Principles and implementation of deductive
parsing. Journal of Logic Programming, 24 :3–36.
Marc Steedman. 1990. Gapping as constituant coordi-
nation. Linguistic and Philosophy, 13 :207–264.
K. Vijay-Shanker and D. Weir. 1993. The use of sha-
red forests in tree adjoining grammar parsing. In
EACL ’93, pages 384–393.
