Coping with problems in grammars automatically extracted from treebanks
Carlos A. Prolo
Computer and Information Science Department
University of Pennsylvania
Suite 400A, 3401 Walnut Street
Philadelphia, PA, USA, 19104-6228
prolo@linc.cis.upenn.edu
Abstract
We report in this paper on an experiment on auto-
matic extraction of a Tree Adjoining Grammar from
the WSJ corpus of the Penn Treebank. We use an
automatic tool developed by (Xia, 2001) properly
adapted to our particular need. Rather than address-
ing general aspects of the automatic extraction we
focus on the problems we have found to extract a
linguistically (and computationally) sound grammar
and approaches to handle them.
1 Introduction
Much linguistic research is oriented to finding
general principles for natural language, classify-
ing linguistic phenomena, building regular mod-
els (e.g., grammars) for the well-behaved (or well-
understood) part of languages and studying remain-
ing “interesting” problems in a compartmentalized
way. With the availability of large natural language
corpora annotated for syntactic structure, the tree-
banks, e.g., (Marcus et al., 1993), automatic gram-
mar extraction became possible (Chen and Vijay-
Shanker, 2000; Xia, 1999). Suddenly, grammars
started being extracted with an attempt to have
“full” coverage of the constructions in a certain lan-
guage (of course, to the extent that the used corpora
represents the language) and that immediately poses
a question: If we do not know how to model many
phenomena grammatically how can that be that we
are extracting such a wide-coverage grammar?.
To answer that question we have to start a new
thread at the edge of linguistics and computational
linguistics. More than numbers to express coverage,
we have to start analyzing the quality of automat-
ically generated grammars, identifying extraction
problems and uncovering whatever solutions are be-
ing given for them, however interesting or ugly they
might be, challenging the current paradigms of lin-
guistic research to provide answers for the problems
on a “by-need” basis.
In this paper we report on a particular experi-
ence of automatic extraction of an English grammar
from the WSJ corpus of the Penn Treebank (PTB)
(Marcus et al., 1994)1 using Tree Adjoining Gram-
mar (TAGs, (Joshi and Schabes, 1997)). We use an
automatic tool developed by (Xia, 2001) properly
adapted to our particular needs and focus on some
problems we have found to extract a linguistically
(and computationally) sound grammar and the so-
lutions we gave to them. The list of problems is
a sample, far from being exhaustive2 Likewise, the
solutions will not always be satisfactory.
In Section 2 we introduce the method of grammar
extraction employed. The problems are discussed in
Section 3. We conclude in Section 4.
2 The extracted grammar
2.1 TAGs
A TAG is a set of lexicalized elementary trees that
can be combined, through the operations of tree ad-
junction and tree substitution, to derive syntac-
tic structures for sentences. We follow a common
approach to grammar development for natural lan-
guage using TAGs, under which, driven by local-
ity principles, each elementary tree for a given lex-
ical head is expected to contain its projection, and
slots for its arguments (e.g., (Frank, 2002)). Figure
1 shows typical grammar template trees that can be
selected by lexical items and combined to generate
the structure in Figure 2. The derivation tree, to the
right, contains the history of the tree grafting pro-
cess that generated the derived tree, to the left.3
1We assume some familiarity with the basic notations in the
PTB as in (Marcus et al., 1994).
2(Prolo, 2002) includes a more comprehensive and detailed
discussion of grammar extraction alternatives and problems.
3For a more comprehensive introduction to TAGs and Lexi-
calized TAGs we refer the reader to (Joshi and Schabes, 1997).
NP
N
S
NP
V NP
VP
VP
VP
PP
P NP
DT
NP
* NP*
np vt detppright
Figure 1: An example of Tree Adjoining Grammar
DT
NP
NP*
N
PP
P
*VP
V NP
N
NP
N
VP
S
[John]
[saw]
[Mary]
[from]
[the]
[window]
Derived   tree Derivation  tree
vt[saw]
np[John] np[Mary]  pp[from]
np[window]
det[the]
Figure 2: Derivation of John saw Mary from the window
2.2 LexTract
Given an annotated sentence from the PTB as in-
put Xia’s LexTract tool (Xia, 1999; Xia, 2001) first
executes a rebracketing. More precisely, additional
nodes are inserted to separate arguments and mod-
ifiers and to structure the modifying process as bi-
nary branching. A typical rebracketed PTB tree is
shown in Figure 3,4 in which we have distinguished
the tree nodes inserted by LexTract.
The second stage is the extraction of the grammar
trees proper shown in Figure 4. In particular, re-
cursive modifier structures have to be detected and
factored out of the derived tree to compose the aux-
iliary trees, the rest becoming an initial tree. The
process is recursive also in the sense that factored
subtree structures still undergo the spinning off pro-
cess until we have all modifiers with their own trees,
all the arguments of a head as substitution nodes of
the tree containing their head, and the material un-
der the argument nodes defining additional initial
trees for themselves. Auxiliary trees are extracted
from parent-child pairs with matching labels if the
child is elected the parent’s head and the child’s sib-
ling is marked as modifier: the parent is mapped
into a root of an auxiliary tree, the head-child into its
4Figures 3 and 4 are thanks to Fei Xia. We are also grateful
to her for allowing us to use LexTract and make changes to its
source code to customize to our needs.
foot, with the sibling subtree (after being recursively
processed) being carried together into the auxiliary
tree. Notice that the auxiliary trees are therefore ei-
ther strictly right or left branching, the foot always
immediately under the root node. Other kinds of
auxiliary trees are therefore not allowed.
VBP
draft
NP
NNS
policies
VBG
using
ε
NP
NNS
pens
VP
S
IN
at
NP
NNP
FNX
PRP
they
RB
still
PP−LOC
NP−SBJ
ADVP
S−MNR
VPNP−SBJ
VP
S
VP
Figure 3: LexTract rebracketing stage
To extract a grammar with Xia’s tool one has to
define tables for finding: the head child of a con-
stituent expansion; which of the siblings of a head
are acceptable arguments; and which constituent la-
bels are plausible modifiers of another. Special pro-
visions are made for handling coordination. For ad-
ditional information see (Xia, 2001). In this paper
we refer to (Xia, 1999)’s table settings and extracted
grammar, which we used as our starting point, as
Xia’s sample. We used a customized version of Lex-
Tract, plus additional pre-processing of the PTB in-
put and post-processing of the extracted trees.
3 Extraction Problems
Extraction problems arise from several sources, in-
cluding: (1) lack of proper linguistic account,5 (2)
the (Penn Treebank) annotation style, (3) the (Lex-
Tract) extraction tool, (4) possible unsuitability of
the (TAG) model, and (5) annotation errors. We
refrained from making a rigid classification of the
problems we present according to these sources. In
particular it is often difficult to decide whether to
blame sources (1), (3), or (5) for a certain problem.
We will not discuss in this paper problems due to
annotation errors. As for the PTB style problems
we only discuss one, the first listed below.
5Here included the (occasional) inability on the part of
grammar developers to find or make use of an existing account.
#7
#5
#6
PP
IN
at
S
still
S2.t
RB
ADVP
draft
VBP
NNS
NP3.b
VP3.t
VP2.t
VBG
S3
VP
NP1.t
FNX
NNP
NP1.b NP2.b
PRP
VP3.b
NP5.t
NP5.b
VP1.b
VP1.t
S2.b
1.b
NP2.t
NP3.t
VP2.b
NP4
ε
5
#5
#2 #3
#4
#1
S1.t
#5
they
policies
using NNS
pens
#8
#5:
policies 
NNS S
VP
VP*
NP
ε VBG
using
VP
NP
#7:
#1:
S
S*PP
IN NP
at
NP
NNP
FNX
#2:
#6: #8:
NP
NNS
pens
NP
S
VP
NPVBP
draft
NP
NP
PRP
#3:
they
#4:
ADVP VP*
RB
still
VP
a) Input tree decomposition b) Extracted elementary trees
Figure 4: LexTract extraction stage
(S-3 (NP-SBJ (PRP We))
(VP (VBP make)
(SBAR-NOM (WHNP-1 (WP what))
(S we know
how to make))))
a) As a sentential clause in the PTB
(S-3 (NP-SBJ (PRP We))
(VP (VBP make)
(NP (NP (WP what))
(SBAR (WHNP-1 (-NONE- 0))
(S we know ...)))))
b) As a Noun phrase after pre-processed
Figure 5: Free relatives in the Treebank
3.1 Free Relatives
Free relatives are annotated in the Penn Treebank
as sentential complements as in Figure 5.a. The
extracted tree corresponding to the occurrence of
“make” would be of a verb that takes a sentential
complement (SBAR). This does not seem to be cor-
rect6, as the proper subcategorization of the verb oc-
currence is transitive.
In fact, free relatives may occur wherever an NP
argument may occur. So, the only reasonable ex-
traction account consistent with maintaining them
as SBARs would be one in which every NP sub-
stitution node in an extracted tree would admit the
6In both standard accounts for free relatives, the Head Ac-
count (e.g., (Bresnan and Grimshaw, 1978)) and the Comp Ac-
count (e.g., (Groos and von Riemsdijk, 1979)), commonly dis-
cussed in the literature, the presence of the NP (or DP) is clear.
existence of a counterpart tree, identical to the first,
except that the NP argument label is replaced with
an SBAR. Instead we opted to reflect the NP char-
acter of the free relatives by pre-processing the cor-
pus (using the Head-analysis, for practical conve-
nience). The annotated example is then automat-
ically replaced with the one in Figure 5.b. Other
cases of free-relatives (non-NP) are rare and not
likely to interfere with verb subcategorization.
3.2 Wh percolation up
In the Penn Treebank the same constituent is anno-
tated with different syntactic categories depending
on whether it possesses or not the wh feature. For
instance, a regular noun phrase has the syntactic
category NP, whereas when the constituent is wh-
marked, and is in the landing site of wh-movement,
it carries the label WHNP.7 While that might look
appealing since the two constituents seem to have
distinct distributional properties, it poses a design
problem. While regular constituents inherit their
syntactic categorial feature (i.e. their label) from
their heads, wh projections are often formed by in-
heritance from their modifiers. For instance: “the
father” is an NP, but modified by a wh expression
(“the father of whom”, “whose father”, “which fa-
ther”), it becomes a WHNP. The only solution we
see is to allow for nouns and NPs to freely project up
to WHNPs during extraction.8 On the other hand, in
7When the constituent is not wh-moved, it is correctly pre-
served as an NP, as “what” in “Who ate what?”.
8Of course another simple solution would be merging the
wh constituents with their non-wh counterparts.
(NP (UCP (NN construction)
(CC and)
(JJ commercial))
(NNS loans))
a) NP modifiers
(VP (VB be)
(UCP-PRD (NP (CD 35))
(CC or)
(ADJP (JJR older))))
b) non-verbal Predicates
(VP (VB take)
(NP (NN effect))
(UCP-TMP (ADVP 96 days later)
(, ,)
(CC or)
(PP in early February)))
c) adverbial modifiers
Figure 6: “Unlike Coordinated Phrases”
cases when the wh constituent is in a non-wh posi-
tion, we need the opposite effect: a WHNP (or wh-
noun POS tag) is allowed to project up to an NP.
3.3 Unlike Coordinated Phrases (UCP)
This is the expression used in the PTB to denote
coordinated phrases in which the coordinated con-
stituents are not of the same syntactic category. The
rationale for the existence of such constructions is
that the coordinated constituents are alternative re-
alizations of the same grammatical function with
respect to a lexical head. In Figure 6.a, both a
noun and an adjective are allowed to modify another
noun, and therefore they can be conjoined while re-
alizing that function. Two other common cases are:
coordination of predicates in copular constructions
(Figure 6.b) and adverbial modification (Figure 6.c).
We deal with the problem as follows. First, we al-
low for a UCP to be extracted as an argument when
the head is a verb and the UCP is marked predica-
tive (PRD function tag) in the training example; or
whenever the head is seen to have an obligatory
argument requirement (e.g., prepositions: “They
come from ((NP the house) and (PP behind the
tree))”). Second, a UCP is allowed to modify (ad-
join to) most of the nodes, according to evidence in
the corpus and common sense (in the first and third
examples above we had NP and VP modification).
With respect to the host tree, when attached as an ar-
gument they are treated like any other non-terminal:
a substitution node. The left tree in Figure 7 shows
a0a2a1
a3
a3a3a5a4
a4a4
a6a8a7a8a1a10a9a11a0a12a1a14a13
a6a8a7a8a1
a3
a3
a3
a3
a4
a4
a4
a4
a0a2a0a15a9a16a7a17a7a19a18a21a20a22a20a23a9
Figure 7: Extracted trees for UCP
the case where the UCP is treated as a modifier. In
fact the trees are both for the example in Figure 6.a.
Notice that the tree is non-lexicalized to avoid ef-
fects of sparseness. The UCP is then expanded as in
the right tree in Figure 7: an initial tree anchored by
the conjunction (the tree attaches either to a tree like
the one in the left or as a true argument – the latter
would be the case for the example in Figure 6.b).
Now, the caveats. First, we are giving the UCP
the status of an independent non-terminal, as if it
had some intrinsic categorial significance (as a syn-
tactic projection). The assumption of independence
of expansion, that for context-free grammars is in-
herent to each non-terminal, in TAGs is further re-
stricted to the substitution nodes. For example,
when an NP appears as substitution node, in a sub-
ject or object position, or as an argument of a prepo-
sition or a genitive marker, we are stating that any
possible expansion for the NP is licensed there. The
same happens for other labels in argument positions
as well. While that is an overgenerating assumption
(e.g. the expletive “there” cannot be the realization
of an NP in object position), it is generally true. For
the UCP, however, we know that its expansion is
in fact strongly dependent on where the substitu-
tion node is, as we have argued before. In fact it
is lexically dependent (cf. “I know ((the problem)
and (that there is no solution to it))”, where the con-
juncts are licensed by the subcategorizations of the
verb “know”). On the other hand, it does not seem
reasonable to expand the UCP node at the hosting
tree – a cross product explosion. A possible way of
alleviating this effect could be to expand only the
auxiliary trees (a UCP modifying a VP is distinct
from a UCP modifying an NP, and moreover they
are independent of lexical items). But for true argu-
ment positions there seems to be no clear solution.
Second, the oddity of the UCP as a label becomes
apparent once again when there are multiple con-
juncts, as in Figure 8: it is enough for one of them to
be distinct to turn the entire constituent into a UCP.
Recursive decomposition in the grammar in these
situations clearly leads to some non-standard trees.
Finally, and more crucially, we have omitted one
case in our discussion: the case in which the UCP
(NP (UCP (JJ electronic)
(, ,)
(NN computer)
(CC and)
(NN building))
(NNS products))
Figure 8: UCP with multiple conjuncts
(S (NP-SBJ-1 The Series 1989 B bonds)
(VP (VBP are)
(VP (VBN rated)
(S *-1 double-A))))
(S (NP-SBJ-1 The Series 1989 B bonds)
(VP (VBP are)
(UCP-PRD (ADJP-PRD (JJ uninsured))
(CC and)
(VP (VBN rated)
(S *-1 double-A)))))
Figure 9: UCP involving VP argument of the copula
is the natural head-child of some node. Under some
accounts of grammar development this never hap-
pens: we have observed that UCP does not appear
as head child in the account where the head is the
syntactic head of a node. We have not always fol-
lowed this rule. With respect to the VP head, so far
we have followed one major tendency in the com-
putational implementation of lexicalized grammars,
according to which lexical verbs are prefered to aux-
iliary verbs to head the VP. Now, consider the pair
of sentences in Figure 9.
Under the lexical verb paradigm, in the first sen-
tence the derivation would start with an initial tree
anchored by the past participle verb (“rated”). But
then we have an interesting problem in the second
sentence, for which we do not currently have a neat
solution. Following Xia’s sample settings of Lex-
Tract parameters, in these cases the extraction is
rescued by switching to the other paradigm: the ini-
tial tree is extracted anchored by the auxiliary verb
with a UCP argument, and the VP is accepted as a
possible conjunct. A systematic move to the syn-
tactic head paradigm, which we may indeed try,
would have important consequences in the locality
assumptions for the grammar development.
3.4 VP topicalization
Another problem with the lexical verb paradigm
(see also discussion under UCP above) is the VP
topicalization as in the sentence in Figure 10. The
solution currently adopted (again, inherited from
(SINV (ADVP (RB Also))
(VP-TPC-2 (VBN excluded)
(NP (-NONE- *-1)))
(VP (MD will)
(VP (VB be)
(VP (-NONE- *T*-2))))
(NP-SBJ-1 investments in ...))
Figure 10: VP topicalization
(S (NP-SBJ (NNP Congress))
(VP (MD could)
(VP (VB pass)
(ADVP-MNR (RB quickly))
(NP (NP (DT a)
(‘‘ ‘‘)
(JJ clean)
(’’ ’’)
(NN bill))
(VP (VBG containing)
(ADVP (JJ only))
(NP ... ))))))
a24
a3
a3
a3
a4
a4
a4
a0a12a1a25a9 a26 a1
a3
a3
a4
a4
a26 a18a11a0a2a1a27a9
Figure 11: The extraposition problem
Xia’s sample settings) is as above: the paradigm is
switched and the auxiliary verb (“be”) is chosen as
the anchor of the initial tree.
3.5 Extraposition and Verb Subcategorization
One of the key design principles that have been
guiding grammar development with TAGs is to keep
verb arguments as substitution slots local to the
tree anchored by the verb. It is widely known that
the Penn Treebank does not distinguish verb ob-
jects from adjuncts. So some sorts of heuristics are
needed to decide, among the candidates, which are
to be taken as arguments (Kinyon and Prolo, 2002);
the rest is extracted as separate VP modifier trees.
However, this step is not enough for the trees to
correctly reflect verb subcategorizations. The oc-
currence of discontinuous arguments, frequently ex-
plained as argument extraposition (the argument is
raised past the adjunct) creates a problem. In the
sentence in Figure 11 the verb “pass” should anchor
a tree with one NP object.
However in such a tree it would be impossible to
adjoin the tree for the intervening ADVP “quickly”
as a VP modifier and still have it between the verb
and the NP.9 LexTract then would instead extract an
9A striking use of sister adjunction in (Chiang, 2000) is ex-
actly the elegant way it solves this problem: the non-argument
tree can be adjoined onto a node (say, VP), positioning itself in
between the VP’s children, which is not possible with TAGs.
(NP (NP the 3 billion New Zealand dollars)
(PRN (-LRB- -LRB-)
(NP US$ 1.76 billion *U*)
(-RRB- -RRB-)))
a) A parenthetical NP attached to another NP
(S (NP-SBJ The total relationship)
(PRN (, ,)
(SBAR-ADV as Mr. Lee sees it)
(, ,))
(VP (VBZ is) ...))
b) A parenthetical S between subject and verb
Figure 12: Parentheticals
intransitive tree for the VB “pass”, onto which the
ADVP modifier tree would adjoin. The second odd-
ity is that the NP object would also be extracted as a
VP modifier tree. In a nutshell, objects in extracted
trees are restricted to those which are not extraposed
and hence the trees may not truly reflect the proper
domain of locality. One view is that the set of trees
for a certain subcategorization frame would include
these degenerate cases. LexTract has an option to
allow limited discontinuity, i.e., a non-argument se-
quence between the verb and the first object (but not
between two objects). The non-arguments would
then be adjoined to the V node.10 So far we have
used only the latter alternative.
It is worth mentioning two other cases of extra-
position. Subject extraposition is handled by having
the extraposed subject, usually a sentential form, ad-
join at the VP of which it is the logical subject (the
original position is still occupied by an NP with the
expletive pronoun “it”). Relative clause extraposi-
tion is modeled by a relative clause tree, only it ad-
joins at a VP, instead of at an NP as is usual.
3.6 Parentheticals
Parenthetical expressions are ubiquitous in lan-
guage: they may appear almost everywhere in a sen-
tence and can be of almost any category (Fig. 12).
We model them as adjoining, either to the left or
right of the constituent they are dominated by, de-
pending on whether they are to the left or right of the
head child of the parent’s node. Occasionally such
trees can also be initial. The respective trees for the
examples of Figure 12 are drawn in Figure 13. It
10Of course, although the solution covers most of the occur-
rences, and apart of any linguistic concern, there are still un-
covered cases, e.g., when a parenthetical expression intervenes
between the first and the second argument.
a0a12a1
a3
a3 a4
a4
a0a2a1a28a13a29a1a8a30a8a0
a0a2a1a25a9
a24
a3
a3 a4
a4
a1a17a30a8a0
a24a22a31a14a32 a30a10a9
a24 a13
Figure 13: Extracted trees for parentheticals
is always the case that the label PRN dominates a
single substitution node. Whenever this was not the
case in the training corpus, heuristics based on ob-
servation were used to enforce that, by inserting an
appropriate missing node.
3.7 Projection labels
LexTract extracts trees with no concern for the ap-
propriate projective structure of constituents when
not explicitly marked in the PTB. Figure 14 shows
two examples of NP modification where the modi-
fiers are single lexical items. The extracted modifier
trees, shown on the right, do not have the projec-
tion for the modifiers JJR “stronger” and the NNP
“October” (which should be, respectively, an ADJP
and an NP). That is so, because those nodes are not
found in the annotation.
(NP (DT a)
(JJR stronger)
(NN argument))
(NP-SBJ-1 (NNP October)
(NN weather))
a0a12a1
a3
a3 a4
a4
a20a33a20a34a30a19a18a11a0a2a1a28a13
a0a2a1
a3
a3a3a35a4
a4a4
a0a12a0a2a1a36a18a11a0a12a1a14a13
Figure 14: Simple modification annotation and ex-
tracted trees
However, if the modifiers are complex, that is, if
the modifiers are themselves modified, the PTB in-
serts their respective projections, and therefore they
appear in the extracted trees, as shown in Figure 15.
There seems to be no reason for the two pairs
of extracted trees to be different. Much of this is
caused by the acknowledged flatness in the Penn
Treebank annotation. That said, the trees like those
in the second pair should be preferred. The projec-
tion node (ADJP or NP) is understood to be domi-
nating its head even when there is no further mod-
ification, and it should be a concern of a good ex-
traction process to insert the missing node into the
grammar. Since LexTract do not allow us to spec-
(NP (DT an)
(ADJP (RB even)
(JJR stronger))
(NN argument))
(NP-SBJ-1 (NP (JJ late)
(NNP October))
(NN weather))
a0a12a1
a3
a3a3 a4
a4a4
a32a19a37 a20a34a1
a20a22a20a38a30a19a18
a0a12a1a14a13
a0a2a1
a3
a3a3 a4
a4a4
a0a2a1
a0a12a0a2a1a36a18
a0a12a1a14a13
Figure 15: Complex modification annotation and
extracted trees
ify for the insertion of “obligatory” projections we
had to accomplish this through a somewhat compli-
cated post-processing step using a projection table.
Some of our current projections are: nouns, per-
sonal pronouns and the existential expletive to NP;
adjectives to ADJP; adverbs to ADVP; sentences ei-
ther to SBAR (S, SINV) or to SBARQ (SQ); Cardi-
nals (CD) to Quantifier Phrases (QP) which them-
selves project to NP. Notice that not all categories
are forcefully projected. For instance, verbs are
not, allowing for simple auxiliary extraction. IN
is also not projected due to its double role as PP
head (true preposition) and subordinate conjunc-
tion, which should project onto SBARs.
4 Conclusion
We discussed an experiment in grammar extraction
from corpora with focus on problems arising while
trying to give an adequate account for naturally oc-
curing phenomena. Without being exhaustive in our
list, we expect to have brougt some attention to the
need to discuss solutions for them which are as rea-
sonable as possible given the current state-of-the-art
of the linguistic research, computational grammar
development and automatic extraction, and given
the current corpus resources at our disposition.
ACKNOWLEDGEMENTS: Thanks to Tonia
Bleam, Erwin Chan, Alexandra Kinyon, Rashmi
Prasad, Beatrice Santorini, Fei Xia and the XTAG
Group for valuable discussions along the realization
of this work and/or comments on this paper or re-
lated material.

References
Joan Bresnan and Jane Grimshaw. 1978. The syn-
tax of free relatives in english. Linguistic Inquiry,
9(3):331–391.
John Chen and K. Vijay-Shanker. 2000. Automated
extraction of TAGs from the Penn Treebank. In
Proceedings of the 6th International Workshop on
Parsing Technologies, Trento, Italy.
David Chiang. 2000. Statistical parsing with an
automatically-extracted Tree Adjoining Gram-
mar. In Proceedings of the 38th Annual Meeting
of the Association for Computational Linguistics,
Hong Kong, China.
Robert Frank. 2002. Phrase Structure Composition
and Syntactic Dependencies. MIT Press, Cam-
bridge, MA, USA.
Anneke Groos and Henk von Riemsdijk. 1979. The
matching effects in free relatives: a parameter of
core grammar. In Theory of Markedness in Gen-
erative Grammar. Scuola Normale Superiore di
Pisa, Italy.
Aravind K. Joshi and Yves Schabes. 1997. Tree-
Adjoining Grammars. In Handbook of Formal
Languages, volume 3, pages 69–123. Springer-
Verlag, Berlin.
Alexandra Kinyon and Carlos A. Prolo. 2002.
Identifying verb arguments and their syntactic
function in the Penn Treebank. In Proc. of the
Third LREC, pages 1982–87, Las Palmas, Spain.
Mitchell Marcus, Beatrice Santorini, and Mary Ann
Marcinkiewicz. 1993. Building a large annotated
corpus of English: The Penn Treebank. Compu-
tational Linguistics, 19(2):313–330.
Mitchell Marcus, Grace Kim, Mary Ann
Marcinkiewicz, Robert MacIntyre, Ann Bies,
Mark Ferguson, Karen Katz, and Britta Schas-
berger. 1994. The Penn Treebank: Annotating
predicate argument structure. In Proceedings
of the 1994 Human Language Technology
Workshop.
Carlos A. Prolo. 2002. LR parsing for Tree Adjoin-
ing Grammars and its application to corpus-based
natural language parsing. Ph.D. Dissertation Pro-
posal, University of Pennsylvania.
Fei Xia. 1999. Extracting tree adjoining gram-
mars from bracketed corpora. In Proceedings of
the 5th Natural Language Processing Pacific Rim
Symposium(NLPRS-99), Beijing, China.
Fei Xia. 2001. Investigating the Relationship be-
tween Grammars and Treebanks for Natural Lan-
guages. Ph.D. thesis, Department of Computer
and Information Science, Un. of Pennsylvania.
