Competence and Performance Grammar
in Incremental Processing
Vincenzo Lombardo
Dipartimento di Informatica
Universit a di Torino
c.so Svizzera, 185
10149, Torino, Italy
vincenzo@di.unito.it
Alessandro Mazzei
Dipartimento di Informatica
Universit a di Torino
c.so Svizzera, 185
10149, Torino, Italy
mazzei@di.unito.it
Patrick Sturt
Department of Psychology
University of Glasgow
58 Hillhead Street
Glasgow, G12 8QB, UK
patrick@psy.gla.ac.uk
Abstract
The goal of this paper is to explore some conse-
quences of the dichotomy between competence
and performance from the point of view of in-
crementality. We introduce a TAG{based for-
malism that encodes a strong notion of incre-
mentality directly into the operations of the
formal system. A left-associative operation is
used to build a lexicon of extended elementary
trees. Extended elementary trees allow deriva-
tions in which a single fully connected struc-
ture is mantained through the course of a left-
to-right word-by-word derivation. In the paper,
we describe the consequences of this view for
semantic interpretation, and we also evaluate
some of the computational consequences of en-
larging the lexicon in this way.
1 Introduction
Incremental processing can be achieved with a
combination of grammar formalism and deriva-
tion/parsing strategy. In this paper we explore
some of the computational consequences of de-
riving the incremental character of the human
language processor from the competence gram-
mar. In the following paragraphs, we assume
that incremental processing proceeds through a
sequence of processing steps. Each step consists
of a con guration of partial syntactic structures
(possibly connected into only one structure) and
a con guration of semantic structures (again,
possibly connected into one single expression).
These semantic structures result from the ap-
plication of the semantic interpreter to the syn-
tactic structures in the same processing step.
Depending on the semantic rules, some syntac-
tic structures may not be interpretable|that
is, some processing steps do not involve an up-
dating of the semantic representation. In the
view we present here, competence grammar is
responsible for the de nition of both the set of
well-formed sentences of the language and the
set of possible partial structures that are yielded
by the derivation process. According to this
view, the performance component is responsible
for other aspects of language processing, includ-
ing ambiguity handling and error handling. The
latter issues are not addressed in this paper.
In the psycholinguistic and computational lit-
erature, many models for incremental process-
ing have been discussed. These models can be
characterized in terms of the location of the bor-
der between competence and performance. In
particular, we discuss the relative responsibility
of the competence and performance components
on three key areas of syntactic processing: a)
the space of well-formed partial syntactic struc-
tures; b) the space of the possible con gurations
of partial syntactic structures at each process-
ing step; c) the sub-space of partial structures
that can actually be interpreted.
The de nition of well-formedness is almost
universally assigned to the competence compo-
nent, whether in a direct implementation of the
grammar formalism (cf. the Type Transparency
hypothesis (Berwick and Weinberg, 1984)) or a
compiled version of the competence grammar
(e.g. LR parsing (Shieber and Johnson, 1993)).
The space of the possible con gurations of
partial structures refers to those partial syntac-
tic structures that are built and stored during
parsing or derivation. Di erent algorithms re-
sult in di erent possibilities for the con gura-
tions of partial structures that the parser builds.
For example, a bottom{up algorithm will never
build a partial structure with non{terminal leaf
nodes. The standard approach is to assign this
responsibility to the parsing algorithm, whether
the grammar is based on standard context-free
formalisms (Roark, 2001), on generative syntac-
tic theories based on a context-free backbone
(Crocker, 1992), or on categorial approaches,
like e.g. Combinatory Categorial Grammar
(CCG { (Steedman, 2000)). A di erent method
is to assign this responsibility to the compe-
tence component. In this case the space of
possible con gurations of partial structures is
constrained by the grammatical derivation pro-
cess itself, and the parsing algorithm needs to
be aligned with these requirements. This ap-
proach is exempli ed by the works of Kempson
et al. (2000) and Phillips (2003), who argue
that many problems in theoretical syntax, like
the de nition of constituency, can be solved by
extending this responsability to the competence
grammar.
This issue of constituency is also relevant in
the third key area, which is the de nition of the
space of interpretable structures. The assign-
ment of responsibility with respect to current
approaches usually depends on the implementa-
tion of the incremental technique. Approaches
based on a coupling of syntactic and seman-
tic rules in the competence grammar (Steed-
man, 2000; Kempson et al., 2000) adhere to the
so-called Strict Competence Hypothesis (Steed-
man, 2000), which constrains the interpreter to
deal only with grammatical constituents, so the
responsibility for deciding the interpretable par-
tial structures is assigned to competence1. In
contrast, approaches that are based on com-
petence grammars that do not include seman-
tic rules, like CFG, implement semantic inter-
preters that mimic such semantic rules (Stabler,
1991), and so they assign the responsibility for
deciding the interpretable partial structures to
performance.
In this paper we explore the empirical con-
sequences of building a realistic grammar when
the formalism constrains all these three areas,
as is the case with Kempson et al. (2000)
and Phillips (2003). The work relies upon the
Dynamic Version of Lexicalized Tree Adjoin-
ing Grammar (DV{TAG), introduced in (Lom-
bardo and Sturt, 2002b), a formalism that en-
codes a dynamic grammar (cf. (Milward, 1994))
in LTAG terms (Joshi and Schabes, 1997). The
consequence of encoding a dynamic grammar
is that the con gurations of partial structures
discussed above are limited to fully connected
structures, that is no disconnected structures
are allowed in a con guration. In particular,
the paper focuses on the problem of building a
realistic DV{TAG grammar through a conver-
sion from an LTAG, in order to maintain the
1Notice that these approaches may, however, di er in
the time-course with which semantic rules are applied
in the interpreter, and this issue depends directly on
the space of con gurations of partial structures discussed
above
linguistic signi cance of elementary trees while
extending them to allow the full connectivity.
2 Dynamic Version of Tree
Adjoining Grammar
This section reviews the major aspects of the
Dynamic Version of Tree Adjoining Grammar
(DV{TAG), with special reference to similari-
ties and di erences with respect to LTAG.
Dynamic grammars de ne well-formedness in
terms of states and transitions between states.
They allow a natural formulation of incremental
processing, where each word wi de nes a tran-
sition from Statei 1, also called the left context,
to Statei (Milward, 1994). The states can be
de ned as partial syntactic or semantic struc-
tures that are \updated" as each word is recog-
nized; roughly speaking, two adjacent states can
be thought of as two parse trees before and af-
ter the attachment of a word, respectively. The
derivation process proceeds from left to right
by extending a fully connected left context to
include the next input word.
Like an LTAG (Joshi and Schabes, 1997), a
Dynamic Version of Tree Adjoining Grammar
(DV{TAG) consists of a set of elementary trees,
divided into initial trees and auxiliary trees,
and attachment operations for combining them.
Lexicalization is expressed through the associ-
ation of a lexical anchor with each elementary
tree. The anchor de nes the semantic content of
the elementary tree: the whole elementary tree
can be seen as an extended projection of the an-
chor (Frank, 2000). LTAG is said to de ne an
extended domain of locality {unlike context-free
grammars, which use rules that describe one{
branch deep fragments of trees, TAG elemen-
tary trees can describe larger structures (e.g. a
verb, its maximal S node and subject NP node).
In  gures 1(a) and 2(a) we can see the ele-
mentary trees for a derivation of the sentence
Bill often pleases Sue for LTAG and DV{TAG
respectively. Auxiliary trees in DV{TAG are
split into left auxiliary trees, where the lexical
anchor is on the left of the foot node, and right
auxiliary trees, where the lexical anchor is on
the right of the foot node. The tree anchored
by often in  g. 2(a) is a left auxiliary tree.
Non-terminal nodes have a distinguished
head daughter, which provides the lexical head
of the mother node: unlike in LTAG, each node
in the elementary trees is augmented with a fea-
ture indicating the lexical head that projects
the node. This feature is needed for the no-
Bill often pleases Sue. 
NNP 
Sue 
NP 
ADV
often
ADVP 
VP 
VP* 
NP 
NNP 
Sue 
S
V
pleases
NP 
VP NNP 
Bill 
ADV
often
ADVP 
VP 
V
pleases 
NP$ 
NP$ 
S
VP 
adjunction
Bill
pleases
Sueoften
(a)
(b)
(c)
substitutionNNP 
Bill 
NP 
substitution
a0 Bill
a0 pleases
a1
often
a0 Sue
Figure 1: The LTAG derivation of the sentence
Bill often pleases Sue.
NNP 
Sue 
NP(Sue)
ADV
often
ADVP(often)
VP(_j)
VP*(_j)
NP(Sue)
NNP 
Sue 
S(pleases)
V
pleases
NP(Bill)
VP(pleases)NNP 
Bill 
ADV
often
ADVP(often)
VP(pleases)
V(_i) NP$(_k)NNP 
Bill 
NP(Bill)
S(_i)
VP( i)
pleases 
likes 
eats 
plays 
 
1. adjunction 
from the left 
2. shift 
Bill
pleases
Sueoften
(a)
(b)
(c)
3. substitution 
Figure 2: The DVTAG derivation of the sen-
tence Bill often pleases Sue.
tion of derivation{dependency tree (see below).
If several unheaded nodes share the same lexical
head, they are all co-indexed with a head vari-
able (e.g. i in the elementary tree anchored by
Bill in  gure 2(a)); the head variable is a vari-
able in logic terms: i will be uni ed with the
constant (\lexical head") pleases.
In both LTAG and DV{TAG the lexical an-
chor does not necessarily provide the head fea-
ture of the root of the elementary tree. This is
trivially true for auxiliary trees (e.g. the tree
anchored often in  gure 1(a) and  gure 2(a)).
However, in DV{TAG this can also occur with
initial trees (e.g. the tree anchored by Bill in
 gure 2(a)), because initial trees can include
not only the head projection of the anchor, but
also other higher projections that are required
to account for the full connectedness of the par-
tial parse tree. The elementary tree anchored
by Bill is linguistically motivated up to the NP
projection; the rest of the structure depends on
connectivity. These extra nodes are called pre-
dicted nodes. A predicted preterminal node is
referred by a set of lexical items. In the sec-
tion 3 we illustrate a method for building such
extended elementary trees.
The derivation process in LTAG and DV{
TAG builds a derived tree by combining the ele-
mentary trees via some operations that are illus-
trated below. DV{TAG implements the incre-
mental process by constraining the derivation
process to be a series of steps in which an ele-
mentary tree is combined with the partial tree
spanning the left fragment of the sentence. The
result of a step is an updated partial structure.
Speci cally, at the processing step i, the ele-
mentary tree anchored by the i-th word in the
sentence is combined with the partial structure
spanning the words from 1 to i  1 positions;
the result is a partial structure spanning the
words from 1 to i. In contrast, LTAG does
not pose any order constraint on the deriva-
tion process, and the combinatorial operations
are de ned over pairs of elementary trees. In
DV{TAG the derivation process starts from an
elementary tree anchored by the  rst word in
the sentence and that does not require any at-
tachment that would introduce lexical material
on the left of the anchor (such as in the case
that a Substitution node is on the left of the
anchor). This elementary tree becomes the  rst
left context that has to be combined with some
elementary tree on the right.
Since in DV{TAG we always combine a left
context with an elementary tree, the number
of attachment operations increases from two
in LTAG to six in DV{TAG. Three operations
(substitution, adjunction from the left and ad-
junction from the right) are called forward op-
erations because they insert the current elemen-
tary tree into the left context; two other oper-
ations (inverse substitution and inverse adjunc-
tion) are called inverse operations because they
insert the left context into the current elemen-
tary tree; the sixth operation (shift) does not
involve any insertion of new structural material.
The  rst operation in DV{TAG is the stan-
dard LTAG substitution, where some elemen-
tary tree replaces a substitution node in another
tree structure (see  g. 2(a)).
Standard LTAG adjunction is split into two
operations: adjunction from the left and ad-
junction from the right. The type of adjunction
depends on the position of the lexical material
introduced by the auxiliary tree with respect
to the material currently dominated by the ad-
joined node (which is in the left context). In
 gure 2(a) we have an adjunction from the left
in the case of the left auxiliary tree anchored by
often.
Inverse operations account for the insertion
of the left context into the elementary tree. In
the case of inverse substitution the left context
replaces a substitution node in the elementary
tree; in the case of inverse adjunction, the left
context acts like an auxiliary tree, and the el-
ementary tree is split because of the adjoining
of the left context at some node. In (Lombardo
and Sturt, 2002b) there is shown the importance
of the latter operation to obtain the correct de-
pendencies for cross-serial Dutch dependencies
in DV{TAG.
Finally, the shift operation either scans a lex-
ical item which has been already introduced in
the structure or derives a lexical item from some
predicted preterminal node.
It is important to notice that, during the
derivation process, not all the nodes in the left
context and the elementary tree are accessible
for performing some operation: given the i 1-
th word in the sentence we can compute a set
of accessible nodes in the left context (the right
fringe); also, given the lexical anchor of the el-
ementary tree, that in the derivation process
matches the i-th word in the sentence, we can
compute a set of accessible nodes in the elemen-
tary tree (the left fringe).
At the end of the derivation process the left
context structure spans the whole sentence, and
is called the derived tree: in the  gures 1(c) and
2(c) there are the derived trees for Bill often
pleases Sue in LTAG and DV{TAG respectively.
A key device in LTAG is the derivation tree
( g. 1(b)). The derivation tree represents the
history of the derivation of the sentence: it de-
scribes the substitutions and the adjoinings that
occur in a sentence derivation through a tree
structure. The nodes of the derivation tree are
identi ers of the elementary trees, and one edge
represents the operation that combines two ele-
mentary trees. Given an edge, the mother node
identi es the elementary tree where the elemen-
tary tree identi ed by the daughter node is sub-
stituted in or adjoined to, respectively. The
derivation tree provides a factorized representa-
tion of the derived tree. Since each elementary
is anchored by a lexical item, the derivation tree
also describes the syntactic dependencies in the
sentence in the terms of a dependency{style rep-
resentation (Rambow and Joshi, 1999) (Dras et
al., 2003).
The notion of derivation tree is not ade-
quate for DV{TAG, since the elementary trees
contain unheaded predicted nodes. For exam-
ple, the elementary tree anchored by Bill ac-
tually involves two anchors, Bill and pleases,
even if the latter anchor remains unspeci ed
until it is scanned/derived in the linear or-
der. We introduce a new word{based structure
that represents syntactic dependencies, namely
a derivation-dependency tree.
A derivation-dependency tree is a head-based
version of the derivation tree. Each node in an
elementary tree is augmented with the lexical
head that projects that node. The derivation-
dependency tree contains one node per lexi-
cal head, and a lexical head dominates another
when the corresponding projections in the de-
rived tree stand in a dominance relation. Each
elementary tree can contain only one overtly
marked lexical head, that represents the seman-
tic unit, but the presence of predicted nodes
in the partial derived tree corresponds to pre-
dicted heads in the derivation-dependency tree.
In  gure 3 is depicted the evolution of the
derivation{dependency tree for the sentence Bill
often pleases Sue.
The DV{TAG derivation process requires the
full connectivity of the left context at all times.
The extended domain of locality provided by
LTAG elementary trees appears to be a desir-
able feature for implementing full connectivity.
However, each new word in a string has to be
connected with the preceding left context, and
there is no a priori limit on the amount of struc-
ture that may intervene between that word and
the preceding context. For example, in a DV{
TAG derivation of John said that tasty apples
a0a2a1 a3 a3 a4a6a5 a7a9a8a11a10
a12
a13a15a14 a16 a16 a17a19a18 a20a9a21a11a22
a23a25a24 a26a28a27a6a29a9a26a11a29
a30 a31a2a32 a33 a33 a34a19a35 a36a38a37a40a39
a41a43a42 a44a11a45a19a46a9a44a40a46
a47a49a48a25a50
a51a53a52a54 a55 a56 a57 a58 a59a6a60 a61
a62a53a63a65a64a67a66a68 a69 a70 a68 a70 a71a72a60a74a73a67a75 a76 a77a67a78 a76 a78
a79a53a80a40a81a53a82a67a83a19a84a65a60a25a85a9a86a88a87
a89a15a90 a91 a91
a12
a92 a93a40a94a65a95a96a96
Figure 3: The DVTAG derivation of the sen-
tence Bill often pleases Sue.
were on sale, the adjective tasty cannot be di-
rectly connected with the S node introduced by
that; there is an intervening NP symbol that
has not yet been predited in the structure. An-
other example is the case of an intervening mod-
i er between an argument and its predicative
head, like in the example Bill often pleases Sue
(see  gure 2), where in order to scan often we
need a VP adjunction node that the NP pro-
jection cannot introduce. So, the extended do-
main of locality available in LTAG has to be
further extended. In particular, some struc-
tures have to be predicted as soon as there is
some evidence from arguments or modi ers on
the left. In other approaches this extension
is implemented via top-down predictions (see
e.g. Roark (2001)) during the parsing process.
This can lead to a high number of combinations
that raise the degree of local ambiguity in the
derivation process. In fact, in the case of Roark
(2001), the method to reduce this problem has
been to use underspeci cation in the right part
of the cf rules.
In the remainder of this paper we address
the issue of building a wide coverage DV{TAG
grammar where elementary trees extend the do-
main of locality given by the argumental struc-
ture, and we provide an empirical evaluation
of the possible combinatorial problems that can
raise with such extended structures.
3 Building a DV-TAG lexicon
The method used to build a wide coverage DV{
TAG grammar is to start with an existing LTAG
grammar, and to extend the elementary trees
through a closure of a left{associative operation.
First, the LTAG elementary tree nodes have
to be augmented with the lexical head informa-
tion through a percolation procedure that takes
into account the syntactic projections.
Then, the elementary trees must be extended
to account for the full connectivity. Given that
one step of the derivation process is a combi-
nation of a left context and an elementary tree,
we have that the rightmost symbol of the left
context and the leftmost anchor of the elemen-
tary tree (the current input word) must be ad-
jacent in the sentence. However, it is possible
(as we have illustrated above) that the left con-
text and the elementary tree cannot be com-
bined through none of the  ve DV{TAG oper-
ations. But if the combination between the left
context and the elementary tree can occur once
we assume some intervening structure, we can
build a superstructure that includes the elemen-
tary tree and extends it until either the left con-
text can be inserted in the left fringe of the new
superstructure or the new superstructure can be
inserted in the right fringe of the left context.
In building the superstructures, we require that
the linguistic dependencies posed by the LTAG
elementary trees over the lexical heads in the
derivation/dependency tree are maintained, in
order not to disrupt the semantic interpretation
process.
Since no new symbol can intervene between
the rightmost symbol of the left context and
the leftmost anchor of the elementary tree (the
current input word), the elementary tree must
be extended in ways that do not alter such lin-
ear order of the terminal symbols. This means
that the elementary tree must be extended with-
out introducing any further structure that can
in turn derive terminal symbols on the left of
the leftmost anchor. In order to satisfy such a
constraint, the elementary tree has to be left{
anchored, i.e. the leftmost symbol of the ele-
mentary tree, but possibly the foot node in case
of a right auxiliary tree, is an anchor. Then,
the operation that extends the left{anchored el-
ementary trees is the left association. The left
association starts from the root of the elemen-
tary tree and combines it with another elemen-
tary tree on the right through through either in-
verse operation (see above)2; this combination
is iterated as far as possible through a transitive
closure of left association (see below). All the
combinations are stored in the extended lexicon.
Since the individual elementary trees that
form a superstructure through left association
are not altered in this process, linguistic depen-
2There are some similarities between left association
and the CCG type raising operation (Steedman, 2000),
because in both cases some (root) category X is \raised"
to some higher category Y.
a0a2a1a3a1a5a4a7a6 a8 a9a11a10
a12a14a13a16a15a14a12a14a17 a8 a17 a13a18a17 a8 a0a19a9
a20
a21a23a22
a24
a25 a26 a27a29a28a3a30
a31a33a32a35a34
a36a33a37a35a38
a39a33a40
a41a33a41a33a42
a43a45a44a47a46a49a48
a50
a51a23a52
a53
a54 a55 a56a29a57a3a58
a59a33a60a35a61
a62a33a63
a64a33a64a33a65
a66a45a67a47a68a49a69
Figure 4: Example of the left association oper-
ation: the trees on the top are, respectively, the
Base tree and the Raising tree; the tree on the
bottom is the Raised tree
dencies are kept unchanged. Left association
can be performed during the parsing/derivation
process (i.e. on{line) or with the goal to extend
the lexicon (i.e. o {line). Since we are explor-
ing the consequences of increasing the role of
the competence grammar, we perform this op-
eration o {line (see the next section).
Each left association operation takes in in-
put two trees: a left{anchored Base tree and a
Raising tree, and produces in output a new left{
anchored Raised tree. A Base tree  can be any
left{anchored elementary tree or a Raised tree.
A Raising tree is any elementary tree  that al-
lows  to combine on its left via either inverse
substitution or adjunction. A Raised tree is a
tree such that  has been attached to  accord-
ing to inverse substitution or inverse adjunction.
The application of the transitive closure of
left association occurs with the termination con-
dition of minimal recursive structure, that is
the non repetition of the root category in the
sequence of Raising trees (henceforth root se-
quence). So, if the original Base tree or some
Raising tree already employed have a root X,
we cannot use a Raising tree rooted X anymore
for the same superstructure.
Considering that LTAG is a lexicalized for-
malism, we immediately realize that a super-
structure is multiply anchored. As an example,
consider the left association illustrated in  g-
ure 4: we substitute the tree anchored by John
into the tree anchored by likes, yielding a larger
elementary structure multiply anchored by John
and likes at the same time (the lexical head in-
formation for each node has been omitted).
a70a72a71 a73a75a74a19a76a11a77a49a78a11a76a29a78a80a79
a81a33a82a84a83 a85a75a86a14a87a80a88a90a89a80a87a80a89a29a91
a92
a93a90a94 a95a18a96a80a97a2a95a18a97
a98a7a99a101a100a103a102 a104a72a105
a106a7a107a84a108a2a109a111a110a45a112a49a113a5a114
a115a7a115a7a116
a117a80a118a45a119a47a120
a121a72a122a14a123a80a124a47a125a127a126a80a128
a129a33a130a84a131a14a132a11a133a135a134a127a136a80a137
a138
a139a16a140a3a141a143a142
a144a7a145a101a146a103a147 a148a72a149
a150a7a151a84a152a2a153a111a154a45a155a49a156a5a157
a158a7a158a7a159
a160a80a161a45a162a47a163
a164a72a165 a166a75a167a19a168a47a169a49a170a80a171
a172a33a173a84a174 a175a75a176a14a177a135a178a49a179a80a180
a181
a182a90a183 a184a18a185a103a186
a187a7a188a101a189a103a190 a191a72a192
a193a7a194a84a195a2a196a111a197a45a198a49a199a5a200
a201a7a201a7a202
a203a80a204a45a205a47a206
a207a72a208a14a209a211a210a143a212a49a213a80a214a29a215
a216a33a217a84a218a14a219a211a220a2a221a49a222a29a223a80a224
a225a226 a227 a228
a229a18a230
a231a7a232a101a233a103a234 a235a72a236
a237a7a238a84a239a2a240a111a241a45a242a49a243a5a244
a245a7a245a7a246
a247a80a248a45a249a47a250
a251 a251 a251
a252a72a253 a254a2a255
a0a2a1a4a3 a5a7a6
a8
a9a4a10
a11 a12 a13a15a14a17a16
a18
a11 a14a20a19a15a16a7a14a17a16
a14a20a19a22a21a23a16
a18
a11 a19a17a24a25a16
a26 a26 a26
a27a29a28a31a30a25a32 a33a35a34
a36a29a37a4a38a7a39a41a40a43a42a45a44a47a46
a48a29a48a29a49
a50a52a51a43a53a55a54
Figure 5: Schema of the left association oper-
ation, followed by the factorization in template
trees.
Multiple anchoring, when not linguistically
motivated like in the case of idioms or speci c
subcategorization constraints, leads to some po-
tential problems. The  rst is the theoretical is-
sue of semantic compositionality, because the
superstructures do not re ect the incremental
process in the semantic composition once words
are not the minimal semantic units anymore (as
assumed in LTAG). The second is a practical is-
sue of duplicating the stored information for all
the verbs sharing the predicate{argument struc-
ture. For example, in the previous example, all
the transitive verbs have a tree structure iden-
tical to the elementary tree of likes (see  g. 5).
These two problems can be solved by intro-
ducing the notion of template, already present
in the practical implementations of wide cov-
erage LTAG systems (Doran et al., 2000). A
tree template is a single elementary tree that
represents the set of elementary trees sharing
the same structure except for the lexical anchor:
one single structure is referred to by pointers
from the word list. All the equal tree struc-
tures that have the same leftmost anchor after
are represented as a single template, where only
the leftmost anchor is lexically realized; all the
other anchors are replaced by variables with re-
ferring word lists that explicitly state the range
of variation of the variables themselves. A vari-
able replaces each occurrence of the lexical item
in the original elementary structure: once a shift
operation matches the current input word with
one of the words in the associated list, we also
have to unify the lexical head variables that aug-
ment non terminal symbols with the current in-
put word. For instance, on the bottom of  g-
ure 5, there is the template obtained by the left
association of the elementary tree anchored by
John with all the equal elementary trees of tran-
sitive verbs.
A further problem is a possible combinatorial
explosion of the size of the lexicon: this prob-
lem has to be tackled in an empirical way on a
wide coverage grammar (it could be that a large
number of the theoretically possible combina-
tions do not occur in practice; in fact, empirical
work by (Lombardo and Sturt, 2002a) indicates
that there is an empirical bound on the size of
expanded elementary trees necessary to main-
tain connectedness).
4 Empirical tests
In order to estimate whether the combinato-
rial explosion has a dramatic e ect on the lex-
icon we have run two tests, implementing the
transitive closure of the left association. The
 rst test was performed on a realistic grammar
from the XTAG system (Doran et al., 2000),
and the second test was performed on an au-
tomatically extracted grammar from an Italian
treebank (Mazzei and Lombardo, 2004).
In the implementation of the recursive pro-
cedure, the left-association operation takes as
input two templates: a left-anchored template,
that we call base template, and another tem-
plate, that cannot be a left{anchored template,
that we call raising template. In every step
of the algorithm, the base template is taken
from the subset of left-anchored templates and
the raising template is picked from the whole
lexicon. Since the algorithm builds only left{
anchored templates, the output template is in-
serted in the left-anchored subset.
The grammar used in the  rst test has 628
tree templates representing one half of the hand-
written XTAG grammar, with the same distri-
bution of template families as the overall XTAG
grammar. This size is a realistic grammar size
(consider that the XTAG lexicon, the widest
LTAG grammar existing for English, 1227 tem-
plates). 140 out of 628 were left{anchored tem-
plates, and the transitive closure from these
base templates produces 176; 190 raised tem-
plates, with a maximum of 7 left associations,
and a distribution of trees that reaches its max-
imum at 4 left associations (140 base tem-
plates, 3; 033 twice raised templates, 24; 855
three-times raised, 62; 970 four times raised,
59; 908  ve times, 22; 454 six times, 2; 970 seven
times). Similar distributional results have been
produced for subsets of the grammar and with
restrictions on root categories. The number of
raised templates drastically reduces when we
forbid raising to verbal projections (S, VP and
V), thus cutting one of major sources of the ex-
plosion. In this case we go from 717 non ver-
bal base templates in the XTAG grammar to
only 24; 468 raised templates, again with a max-
imum of 7 raisings (notice that the base lexicon
is larger than the test above).
In the second test we used a LTAG grammr
extracted from the 45; 000 word TUT (Turin
University Treebank). The number of extracted
tree templates was 1283. In this case, as the
grammar was relatively large, we decided to im-
pose an extra condition on the closure for left
association procedure. We estimated the maxi-
mum number of trees that need to be composed
to create any one left associated tree. This
was done by inspecting the derivation trees for
each sentence of the treebank, and looking at
the leftmost child of each level of the deriva-
tion tree. It was found that no left-associated
tree needed to be composed of more than three
elementary trees, in order to create a covering
DV{TAG for the treebank. This replicates a
previous result of Lombardo and Sturt (2002a).
Moreover, of the 800 mathematically possible
root sequences3 only 67 were present in the tree-
bank. We decided to allow left association only
for root sequences that actually appeared in the
treebank. This resulted in a total of 706,866 left
associated trees. Of these, 988 were base tem-
plates, 87,245 were raised twice, and 618,654
were raised three times.
The combinatorial explosion seen in the two
experiments suggests the use of underspeci -
cation techniques before applying DV{TAG in
a realistic setting (see Roark (2001) for one
method applied to Context Free Grammar).
However, in order to estimate the amount of
3In the TUT treebank we have 27 non-terminal sym-
bols, and 20 possible root categories.
ambiguity that can arise in a parsing process
we need to refer to some speci c parsing model.
Also selective strategies on categories and re-
striction to empirically observed root sequences
can be e ective. However, in order to make a
full evaluation of these strategies, it is desirable
to perform further coverage tests.
5 Conclusion
This paper has explored some consequences
of building an incremental grammar that con-
strains the possible con gurations of partial
structures and the possible interpretable partial
structures.
We have introduced a TAG{based formalism
that encodes a strong notion of incrementality
directly into the operations of the formal sys-
tem.
A left-associative type-raising operation has
been used to build a DV{TAG realistic lexi-
con from a LTAG lexicon. The left{association
operation can be viewed as a sort of chunking
of linguistic knowledge: this chunking could be
useful in the de nition of a language model in
which speci c combinations of elementary trees
that have been successful in the past experience
become part of the lexicon.
As shown in the empirical tests ofn English
and Italian, the left{association closure can lead
to severe computational problems that suggest
the adoption of some form of underspeci cation.

References
R. Berwick and A. Weinberg. 1984. The gram-
matical basis of linguistic performance: lan-
guage use and acquisition. MIT Press.
M. W. Crocker. 1992. A Logical Model of Com-
petence and Performance in the Human Sen-
tence Processor. Ph.D. thesis, Dept. of Ar-
ti cial Intelligence, University of Edinburgh,
UK.
C. Doran, B. Hockey, A. Sarkar, B. Srinivas,
and F. Xia. 2000. Evolution of the xtag sys-
tem. In A. Abeill e and O. Rambow, edi-
tors, Tree Adjoining Grammars, pages 371{
405. Chicago Press.
M. Dras, D. Chiang, and W. Schuler. 2003.
On relations of constituency and dependency
grammars. Language and Computation, in
press.
R. Frank. 2000. Phrase structure composi-
tion and syntactic dependencies. Unpub-
lished Manuscript.
A. Joshi and Y. Schabes. 1997. Tree-adjoining
grammars. In G. Rozenberg and A. Salo-
maa, editors, Handbook of Formal Languages,
pages 69{123. Springer.
R. Kempson, W. Meyer-Viol, and D. Gabbay.
2000. Dynamic Syntax: the Flow of Language
Understanding. Blackwell, Oxford, UK.
V. Lombardo and P. Sturt. 2002a. Incremen-
tality and lexicalism: A treebank study. In
S. Stevenson and P. Merlo, editors, Lexical
Representations in Sentence Processing. John
Benjamins.
V. Lombardo and P. Sturt. 2002b. Towards a
dynamic version of tag. In TAG+6, pages 30{
39.
A. Mazzei and V. Lombardo. 2004. Building
a large grammar for italian. In LREC04. in
press.
David Milward. 1994. Dynamic depen-
dency grammar. Linguistics and Philosophy,
17:561{604.
C. Phillips. 2003. Linear order and con-
stituency. Linguistic Inquiry, 34:37{90.
O. Rambow and A. Joshi. 1999. A formal look
at dependency grammars and phrase struc-
ture grammars, with special consideration of
word-order phenomena. In Recent Trends in
Meaning-Text Theory, pages 167{190. John
Benjamins, Amsterdam and Philadelphia.
B. Roark. 2001. Probabilistic top-down parsing
and language modeling. Computational Lin-
guistics, 27(2):249{276.
S. M. Shieber and M. Johnson. 1993. Varia-
tions on incremental interpretation. Journal
of Psycholinguistic Research, 22(2):287{318.
E. P. Stabler. 1991. Avoid the pedestri-
ans’ paradox. In R. C. Berwick, S. P. Ab-
ney, and C. Tenny, editors, Principle-based
parsing: computation and psycholinguistics,
pages 199{237. Kluwer, Dordrecht: Holland.
M. J. Steedman. 2000. The syntactic process.
A Bradford Book, The MIT Press.
