Towards Automatic Generation of Natural Language Generation
Systems
John Chen , Srinivas Bangalorey, Owen Rambow , and Marilyn A. Walkery
Columbia University AT&T Labs{Researchy
New York, NY 10027 Florham Park, NJ 07932
fjchen,rambowg@cs.columbia.edu fsrini,walkerg@research.att.com
Abstract
Systems that interact with the user via natural
language are in their infancy. As these systems
mature and become more complex, it would be
desirable for a system developer if there were
an automatic method for creating natural lan-
guage generation components that can produce
quality output e ciently. We conduct experi-
ments that show that this goal appears to be
realizable. In particular we discuss a natural
language generation system that is composed of
SPoT, a trainable sentence planner, and FER-
GUS, a stochastic surface realizer. We show
how these stochastic NLG components can be
made to work together, that they can be ported
to new domains with apparent ease, and that
such NLG components can be integrated in a
real-time dialog system.
1 Introduction
Systems that interact with the user via natural
language are in their infancy. As these systems
mature and become more complex, it would
be desirable for a system developer if there
were automatic methods for creating natural
language generation (NLG) components that
can produce quality output e ciently. Stochas-
tic methods for NLG may provide such auto-
maticity, but most previous work (Knight and
Hatzivassiloglou, 1995), (Langkilde and Knight,
1998), (Oh and Rudnicky, 2000), (Uchimoto et
al., 2000), (Bangalore and Rambow, 2000) con-
centrate on the speci cs of individual stochastic
methods, ignoring other issues such as integra-
bility, portability, and e ciency. In contrast,
this paper investigates how di erent stochastic
NLG components can be made to work together
e ectively, whether they can easily be ported to
new domains, and whether they can be inte-
grated in a real-time dialog system.
Request(DEPART−DATE)
Surface Generator
FERGUS
TTS
SPoT
Dialog Manager
Sentence Planner
DM
Imp−conf(N)
soft−merge
Text to Speech
Implicit−confirm(NEWARK)
Implicit−confirm(DALLAS)
period
Imp−conf(D)
Flying from Newark to
Dallas.  What date would
you like to leave?
Request(D−D)
Figure 1: Components of an NLG system.
Recall the basic tasks in NLG. During text
planning, content and structure of the target
text are determined to achieve the overall com-
municative goal. During sentence planning, lin-
guistic means|in particular, lexical and syn-
tactic means|are determined to convey smaller
pieces of meaning. During realization, the spec-
i cation chosen in sentence planning is trans-
formed into a surface string by linearizing and
in ecting words in the sentence (and typically,
adding function words). Figure 1 shows how
such components cooperate to generate text
corresponding to a set of communicative goals.
Our work addresses both the sentence plan-
ning stage and the realization stage. The sen-
tence planning stage is embodied by the SPoT
sentence planner (Walker et al., 2001), while
the surface realization stage is embodied by the
FERGUS surface realizer (Bangalore and Ram-
bow, 2000). We extend the work of (Walker et
al., 2001) and (Bangalore and Rambow, 2000)
in various ways. We show that apparently each
of SPoT and FERGUS can be ported to di er-
ent domains with little manual e ort. We then
show that these two components can work to-
gether e ectively. Finally, we show the on-line
integration of FERGUS with a dialog system.
2 Testing the Domain Independence
of Sentence Planning
In this section, we address the issue of the
amount of e ort that is required to port a sen-
tence planner to new domains. In particular,
we focus on the SPoT sentence planner. The
 exibility of the training mechanism that SPoT
employs allows us to perform experiments that
provide evidence for its domain independence.
Being a sentence planner, SPoT chooses ab-
stract linguistic resources (meaning-bearing lex-
emes, syntactic constructions) for a text plan. A
text plan is a set of communicative goals which
is assumed to be output by a dialog manager of
a spoken dialog system. The output of SPoT is
a set of ranked sentence plans, each of which is
a binary tree with leaves labeled by the commu-
nicative goals of the text plan.
SPoT divides the sentence planning task into
two stages. First, the sentence-plan-generator
(SPG) generates 12-20 possible sentence plans
for a given input text plan. These are gener-
ated randomly by incrementally building each
sentence plan according to some probability
distribution. Second, the sentence-plan-ranker
(SPR) ranks the resulting set of sentence plans.
SPR is trained for this task via RankBoost (Fre-
und et al., 1998), a machine learning algorithm,
using as training data sets of sentence plans
ranked by human judges.
In porting SPoT to a new domain, this last
point seems to be a hindrance. New train-
ing data in the new domain ranked by hu-
man judges might be needed in order to train
SPoT. To the contrary, our experiments that
show that this need not be the case. We par-
tition the set of all features used by (Walker et
al., 2001) to train SPoT into three subsets ac-
cording to their level of domain and task de-
pendence. Domain independent features are
features whose names include only closed-class
words, e.g. \in," or names of operations that in-
crementally build the sentence plan, e.g. merge.
Domain-dependent, task-independent features
are those whose names include open class words
Features Used Mean Score S.D.
all 4.56 0.68
domain-independent 4.55 0.69
task-independent 4.20 0.99
task-dependent 3.90 1.19
Table 1: Results for subsets of features used to
train SPoT
speci c to this domain, e.g. \travel" or the
names of the role slots, e.g. $DEST-CITY. Do-
main dependent, task dependent features are
features whose names include the value of a role
 ller for the domain, e.g. \Albuquerque."
We have trained and tested SPoT with these
di erent feature subsets using the air-travel do-
main corpus of 100 text plans borrowed from
(Walker et al., 2001), using  ve fold cross-
validation. Results are shown in Table 2 us-
ing t-tests with the modi ed Bonferroni statis-
tic for multiple comparisons. Scores can range
from 1.0 (worst) to 5.0 (best). The results in-
dicate that the domain independent feature set
performs as well as all the features (t = .168, p
= .87), but that both the task independent (t
= 6.25, p = 0.0) and the task dependent (t =
4.58, p = 0.0) feature sets perform worse.
3 Automation in Training a Surface
Realizer
As with the sentence planning task, there is the
possibility that the task of surface realization
may be made to work in di erent domains with
relatively little manual e ort. Here, we perform
experiments using the FERGUS surface realizer
to determine whether this may be so. We re-
view the FERGUS architecture, enumerate re-
sources required to train FERGUS, recapitulate
previous experiments that indicate how these
resources can be automatically generated, and
 nally show how similar ideas can be used to
port FERGUS to di erent domains with little
manual e ort.
3.1 Description of the FERGUS
Surface Realizer
Given an underspeci ed dependency tree repre-
senting one sentence as input, FERGUS outputs
the best surface string according to its stochas-
tic modeling. Each node in the input tree corre-
sponds to a lexeme. Nodes that are related by
grammatical function are linked together. Sur-
face ordering of the lexemes remains unspeci ed
in the tree.
FERGUS consists of three models: tree
chooser, unraveler, and linear precedence
chooser. The tree chooser associates a su-
pertag (Bangalore and Joshi, 1999) from a tree-
adjoining grammar (TAG) with each node in
the underspeci ed dependency tree. This par-
tially speci es the output string’s surface order;
it is constrained by grammatical constraints en-
coded by the supertags (e.g. subcategorization
constraints, voice), but remains free otherwise
(e.g. ordering of modi ers). The tree chooser
uses a stochastic tree model (TM) to select a
supertag for each node in the tree based on lo-
cal tree context. The unraveler takes the re-
sulting semi-speci ed TAG derivation tree and
creates a word lattice corresponding to all of
the potential surface orderings consistent with
this tree. Finally, the linear precedence (LP)
chooser  nds the best path through the word
lattice according to a trigram language model
(LM), specifying the output string completely.
Certain resources are required in order to
train FERGUS. A TAG grammar is needed|
the source of the supertags with which the
semi-speci ed TAG derivation tree is annotated.
There needs to be a treebank in order to ob-
tain the stochastic model TM driving the tree
chooser. There also needs to be a corpus of sen-
tences in order to train the language model LM
required for the LP chooser.
3.2 Labor-Minimizing Approaches to
Training FERGUS
The resources that are needed to train FER-
GUS seem quite labor intensive to develop. But
(Bangalore et al., 2001) show that automati-
cally generated version of these resources can
be used by FERGUS to obtain quality output.
Two kinds of TAG grammar are used in (Ban-
galore et al., 2001). One kind is a manually de-
veloped, broad-coverage grammar for English:
the XTAG grammar (XTAG-Group, 2001). It
consists of approximately 1000 tree frames. Dis-
advantages of using XTAG are the consider-
able amount of human labor expended in its
development and the lack of a treebank based
on XTAG|the only way to estimate parame-
ters in the TM is to rely on a heuristic map-
ping of XTAG tree frames onto a pre-existing
treebank (Bangalore and Joshi, 1999). Another
kind of grammar is a TAG automatically ex-
tracted from a treebank using the techniques of
(Chen, 2001) (cf. (Chiang, 2000), (Xia, 1999)).
These techniques extract a linguistically mo-
tivated TAG using heuristics programmed us-
ing a modicum of human labor. They nullify
the disadvantages of using the XTAG grammar,
but they introduce potential complications|
notably, an extracted grammar’s size is often
much larger than that of XTAG, typically more
than 2000 tree frames, potentially leading to a
larger sparse data problem, and also the result-
ing grammar is not hand-checked.
Two kinds of treebank are used in (Bangalore
et al., 2001). One kind is the Penn Treebank
(Marcus et al., 1993). It consists of approxi-
mately 1,000,000 words of hand-checked, brack-
eted text. The text consists of Wall Street Jour-
nal news articles. The other kind of treebank is
the BLLIP corpus (Charniak, 2000). It con-
sists of approximately 40,000,000 words of text
that has been parsed by a broad-coverage sta-
tistical parser. The text consists of Wall Street
Journal news and newswire articles. The ad-
vantage of the former is that it has been hand-
checked, whereas the latter has the advantage
of being easily produced and hence can easily
be enlarged.
(Bangalore et al., 2001) experimentally de-
termine how the quality and quantity of the
resources used in training FERGUS a ect the
output quality of the generator. They  nd that
while a better quality annotated corpus (Penn
Treebank) results in better model accuracy than
a lower quality corpus (BLLIP) of the same size,
an (easily-obtained) larger lower quality corpus
results in a model that eclipses a smaller, better
quality treebank. Also, the model that is ob-
tained when using an automatically extracted
grammar yields comparable output quality to
the model that is obtained when using a hand-
crafted (XTAG) grammar.
3.3 Automating Adaptation of
FERGUS to a New Domain
This paper is about minimizing the amount of
manual labor that is required to port NLG com-
ponents to di erent domains. (Bangalore et
al., 2001) perform all of their experiments on
the same domain of Wall Street Journal news
articles. In contrast, in this section we show
that FERGUS can be adapted to the domain of
air-travel reservation dialogs with minimal hu-
man e ort. We show that out-of-domain train-
ing data can be used instead of in-domain data
without drastically compromising output qual-
ity. We also show that partially parsed in-
domain training data can be e ectively used
to train the TM. Finally, we show that using
an in-domain corpus to train the LM can help
the output quality, even if that corpus is of
small size. In this section, we  rst describe the
training resources that are used in these exper-
iments. We subsequently describe the experi-
ments themselves and their results.
Various corpora are used in these experi-
ments. For training, there are two distinct
corpora. First, there is the previously in-
troduced Penn Treebank (PTB). As the
alternative, there is a human-human corpus of
dialogs (HH) from Carnegie Mellon University.
The HH corpus consists of approximately
13,000 words in the air-travel reservation
domain. This is not exactly the target domain
because human-human interaction di ers
from human-computer interaction which is
our true target domain. From this raw text,
an LDA parser (Bangalore and Joshi, 1999)
trained using the XTAG-based Penn Treebank
creates a partially-parsed, non-hand-checked
treebank. Test data consists of about 2,200
words derived from Communicator template
data. Communicator templates are hand-
crafted surface strings of words interspersed
with slot names. An example is \What time
would you, traveling from $ORIG-CITY
to $DEST-CITY like to leave?" The test
data is derived from all strings like these, with
duplicates, in the Communicator system by
replacing the slot names with  llers according
to a probability distribution. Furthermore,
dependency parses are assigned to the resulting
strings by hand.
In the  rst series of experiments, we ascertain
the output quality of FERGUS using the XTAG
grammar on di erent training corpora. We vary
the TM’s training corpus to be either PTB or
HH. We do the same for the LM’s training cor-
pus. Assessing the output quality of a generator
is a complex issue. Here, we select as our met-
ric understandability accuracy, de ned in (Ban-
galore et al., 2000) as quantifying the di er-
PTB TM HH TM
PTB LM 0.30 0.38
HH LM 0.37 0.41
Table 2: Average understandability accuracies
using XTAG-Based FERGUS for various kinds
of training data
PTB TM
PTB LM 0.39
HH LM 0.33
Table 3: Average understandability accuracies
using automatically-extracted grammar based
FERGUS for various kinds of training data
ence between the generator output, in terms of
both dependency tree and surface string, and
the desired reference output. (Bangalore et al.,
2000)  nds this metric to correlate well with hu-
man judgments of understandability and qual-
ity. Understandability accuracy varies between
a high score of 1.0 and a low score which may
be less than zero.
The results of our experiments are shown in
Table 2. We conclude that despite its smaller
size, and despite its being only automatically-
and partially- parsed, using the in-domain HH
is more e ective than using the out-of-domain
PTB for training the TM. Similarly, HH is more
e ective than PTB for training the LM. The
best result is obtained by using HH to train both
the TM and the LM; this result (0.41) is com-
parable to the result obtained by using matched
PTB training and test data (0.43) that is used
in (Bangalore et al., 2001).
The second series of experiments investi-
gates the output quality of FERGUS using
automatically-extracted grammars. In these ex-
periments, the TM is always trained on PTB
but not HH. It is the type of training data that
is used to train the LM, either PTB or HH,
that is varied. The results are shown in Ta-
ble 3. Note that these scores are in the same
range as those obtained when training FER-
GUS using XTAG. Also, these scores show that
when using automatically-extracted grammars,
training LM using a large, out-of-domain cor-
pus (PTB) is more bene cial than training LM
using a small, in-domain corpus (HH).
We can now draw various conclusions about
training FERGUS in a new domain. Con-
sider training the TM. It is not necessary
to use a handwritten TAG in the new do-
main; a broad-coverage hand-written TAG or
an automatically-extracted TAG will give com-
parable results. Also, instead of requiring a
hand-checked treebank in the new domain, par-
tially parsed data in the new domain is ade-
quate. Now consider training the LM. Our ex-
periments show that a small corpus in the new
domain is a viable alternative to a large corpus
that is out of the domain.
4 Integration of SPoT with
FERGUS
We have seen evidence that both SPoT and
FERGUS may be easily transferable to a new
domain. Because the output of a sentence plan-
ner usually becomes the input of a surface real-
izer, questions arise such as whether SPoT and
FERGUS can be made to work together in a
new domain and what is the output quality of
the combined system. We will see that an ad-
dition of a rule-based component to FERGUS
will be necessary in order for this integration
to occur. We will subsequently see that the
output quality of the resulting combination of
SPoT and FERGUS is quite good.
Integration of SPoT as described in Section 2
and FERGUS as described in Section 3 is not
automatic. The reason is that the output of
SPoT is a deep syntax tree (Mel’ cuk, 1998)
whereas hitherto the input of FERGUS has
been a surface syntax tree. The primary dis-
tinguishing characteristic of a deep syntax tree
is that it contains features for categories such as
de niteness for nouns, or tense and aspect for
verbs. In contrast, a surface syntax tree real-
izes these features as function words. However,
there is a one-to-one mapping from features of
a deep syntax tree to function words in the cor-
responding surface syntax tree. Therefore, inte-
grating SPoT with FERGUS is basically a mat-
ter of performing this mapping. We have added
a rule-based component (RB) as the new  rst
stage of FERGUS to do just that. Note that it
is erroneous to think that RB makes choices be-
tween di erent generation options because there
is a one-to-one mapping between features and
function words.
PTB TM HH TM
PTB LM 0.48 0.47
HH LM 0.73 0.68
Table 4: Average understandability accuracies
of SPoT-integrated, XTAG-Based FERGUS for
various kinds of training data
After the addition of RB to FERGUS, we
evaluate the output quality of the combination
of SPoT and FERGUS. Only the XTAG gram-
mar is used in this experiment. As in previous
experiments with the XTAG grammar, there is
either the option of training using HH or PTB
derived data for either the TM or LM, giving a
total of four possibilities.
Test data is obtained by output strings that
are produced by the combination of SPoT and
the RealPro surface realizer (Lavoie and Ram-
bow, 1998). RealPro has the advantage of pro-
ducing high quality surface strings, but at the
cost of having to be hand-tuned to a particu-
lar domain. It is this cost we are attempting to
minimize by using FERGUS. Only those sen-
tence plans produced by SPoT ranked 3.0 or
greater by human judges are used. The surface
realization of these sentence plans yields a test
corpus of 2,200 words.
As shown in Table 4, the performance of
SPoT and FERGUS combined is quite high.
Also note that in terms of training the LM, out-
put quality is markedly better when HH is used
rather than PTB. Furthermore, note that there
is a smaller di erence between using PTB or
HH to train TM when compared to previous re-
sults shown in Table 2. This seems to indicate
that the TM’s e ect on output quality dimin-
ishes because of addition of RB to FERGUS.
5 On-line Integration of FERGUS
with a Dialog System
Certain statistical natural language processing
systems can be quite slow, usually because of
the large search space that these systems must
explore. It is therefore uncertain whether a sta-
tistical NLG component can be integrated into a
real-time dialog system. Investigating the mat-
ter in FERGUS’s case, we have experimented
with integrating FERGUS into Communicator,
a mixed-initiative, airline travel reservation sys-
tem. We begin by explaining how Communica-
tor manages surface generation without FER-
GUS. We then delineate several possible kinds
of integration. Finally, we describe our experi-
ences with one kind of integration.
Communicator performs only a rudimentary
form of surface generation as follows. The
dialog manager of Communicator issues a set
of communicative goals that are to be realized.
Surface template strings are selected based
on this set, such as \What time would you,
traveling from $ORIG-CITY to $DEST-CITY
like to leave?" The slot names in these
strings are then replaced with  llers according
to the dialog manager’s state. The resulting
strings are then piped to a text-to-speech
component (TTS) for output.
There are several possibilities as to how FER-
GUS may supplant this system. One possibility
is o -line integration. In this case, the set of all
possible sets communicative goals for which the
dialog manager requires realization are matched
with a set of corresponding surface syntax trees.
The latter set is input to FERGUS, which gen-
erates a set of surface template strings, which in
turn is used to replace the manually created sur-
face template strings that are an original part
of Communicator. Since these changes are pre-
compiled, the resulting version of Communica-
tor is therefore as fast the original. On the other
hand, o -line integration may be unmanageable
if the set of sets of communicative goals is very
large. In that case, only the alternative of on-
line integration is palatable. In this approach,
each surface template string in Communicator is
replaced with its corresponding surface syntax
tree. At points in a dialog where Communicator
requires surface generation, it sends the appro-
priate surface syntax trees to FERGUS, which
generates surface strings.
We have implemented the on-line integration
of FERGUS with Communicator. Our experi-
ments show that FERGUS is fast enough to be
used in for this purpose, the average time for
FERGUS to generate output strings for one di-
alog turn being only 0.28 seconds.
6 Conclusions and Future Work
We have performed experiments that provide
evidence that components of a statistical NLG
system may be ported to di erent domains
without a huge investment in manual labor.
These components include a sentence planner,
SPoT, and a surface realizer, FERGUS. SPoT
seems easily portable to di erent domains be-
cause it can be trained well using only domain-
independent features. FERGUS may also be
said to be easily portable because our experi-
ments show that the quality and quantity of in-
domain training data need not be high and plen-
tiful for decent results. Even if in-domain data
is not available, we show that out-of-domain
training data can be used with adequate results.
By integrating SPoT with FERGUS, we have
also shown that di erent statistical NLG com-
ponents can be made to work well together. In-
tegration was achieved by adding a rule-based
component to FERGUS which transforms deep
syntax trees into surface syntax trees. The com-
bination of SPoT and FERGUS performs with
high accuracy. Post-integration, there is a di-
minishing e ect of TM on output quality.
Finally, we have shown that a statistical NLG
component can be integrated into a dialog sys-
tem in real time. In particular, we replace the
hand-crafted surface generation of Communica-
tor with FERGUS. We show that the resulting
system performs with low latency.
This work may be extended in di erent di-
rections. Our experiments showed promising re-
sults in porting to the domain of air travel reser-
vations. Although this is a reasonably-sized do-
main, it would be interesting to see how our
 ndings vary for broader domains. Our experi-
ments used a partially parsed version of the HH
corpus. We would like to compare its use as
TM training data in relation to using a fully
parsed version of HH, and also a hand-checked
treebank version of HH. We would also like to
investigate the possibility of interpolating mod-
els based on di erent kinds of training data in
order to ameliorate data sparseness. Our ex-
periments focused on integration between the
NLG components of sentence planning and sur-
face generation. We would like to explore the
possibility of further integration, in particular
integrating these components with TTS. This
would provide the bene t of enabling the use of
syntactic and semantic information for prosody
assignment. Also, although FERGUS was inte-
grated with SPoT relatively easily, it does not
necessarily follow that FERGUS can be inte-
grated easily with other kinds of components.
It may be worthwhile to envision a redesigned
version of FERGUS whose input can be  exibly
underspeci ed in order to accommodate di er-
ent kinds of modules.
7 Acknowledgments
This work was partially funded by DARPA un-
der contract MDA972-99-3-0003.

References

Srinivas Bangalore and A. K. Joshi. 1999. Su-
pertagging: An approach to almost parsing.
Computational Linguistics, 25(2).

Srinivas Bangalore and Owen Rambow. 2000.
Exploiting a probabilistic hierarchical model
for generation. In Proceedings of the 18th
International Conference on Computational
Linguistics (COLING 2000).

Srinivas Bangalore, Owen Rambow, and Steve
Whittaker. 2000. Evaluation metrics for gen-
eration. In Proceedings of the First Interna-
tional Conference on Natural Language Gen-
eration, Mitzpe Ramon, Israel.

Srinivas Bangalore, John Chen, and Owen
Rambow. 2001. Impact of quality and quan-
tity of corpora on stochastic generation. In
Proceedings of the 2001 Conference on Em-
pirical Methods in Natural Langauge Process-
ing, Pittsburgh, PA.

Eugene Charniak. 2000. A maximum-entropy-
inspired parser. In Proceedings of First An-
nual Meeting of the North American Chap-
ter of the Association for Computational Lin-
guistics, Seattle, WA.

John Chen. 2001. Towards E cient Statis-
tical Parsing Using Lexicalized Grammati-
cal Information. Ph.D. thesis, University of
Delaware.

David Chiang. 2000. Statistical parsing with an
automatically-extracted tree adjoining gram-
mar. In Proceedings of the the 38th Annual
Meeting of the Association for Computational
Linguistics, pages 456{463, Hong Kong.

Yoav Freund, Raj Iyer, Robert E. Schapire, and
Yoram Singer. 1998. An e cient boosting
algorithm for combining preferences. In Ma-
chine Learning: Proceedings of the Fifteenth
International Conferece.

Kevin Knight and V. Hatzivassiloglou. 1995.
Two-level many-paths generation. In Pro-
ceedings of the 33rd Annual Meeting of the
Association for Computational Linguistics,
Boston, MA.

Irene Langkilde and Kevin Knight. 1998. Gen-
eration that exploits corpus-based statisti-
cal knowledge. In Proceedings of the 17th
International Conference on Computational
Linguistics and the 36th Annual Meeting of
the Association for Computational Linguis-
tics, Montreal, Canada.

Benoit Lavoie and Owen Rambow. 1998. A
framework for customizable generation of
multi-modal presentations. In Proceedings of
the 17th International Conference on Com-
putational Linguistics and the 36th Annual
Meeting of the Association for Computational
Linguistics, Montreal, Canada.

Mitchell Marcus, Beatrice Santorini, and
Mary Ann Marcinkiewicz. 1993. Building
a large annotated corpus of english: the
penn treebank. Computational Linguistics,
19(2):313{330.

Igor A. Mel’ cuk. 1998. Dependency Syntax:
Theory and Practice. State University of New
York Press, New York, NY.

Alice H. Oh and Alexander I. Rudnicky.
2000. Stochastic language generation for spo-
ken dialog systems. In Proceedings of the
ANLP/NAACL 2000 Workshop on Conver-
sational Systems, pages 27{32, Seattle, WA.

Kiyotaka Uchimoto, Masaki Murata, Qing Ma,
Satoshi Sekine, and Hitoshi Isahara. 2000.
Word order acquisition from corpora. In Pro-
ceedings of the 18th International Confer-
ence on Computational Linguistics (COLING
’00), Saarbr ucken, Germany.

Marilyn A. Walker, Owen Rambow, and Mon-
ica Rogati. 2001. Spot: A trainable sentence
planner. In Proceedings of the Second Meeting
of the North American Chapter of the Asso-
ciation for Computational Linguistics, pages
17{24.

Fei Xia. 1999. Extracting tree adjoining gram-
mars from bracketed corpora. In Fifth Natu-
ral Language Processing Paci c Rim Sympo-
sium (NLPRS-99), Beijing, China.

The XTAG-Group. 2001. A Lexicalized
Tree Adjoining Grammar for English.
Technical report, University of Penn-
sylvania. Updated version available at
http://www.cis.upenn.edu/~xtag.
