Lexicalized Grammar 101
Matthew Stone
Department of Computer Science and Center for Cognitive Science
Rutgers, the State University of New Jersey
Piscataway NJ 08854-8019 USA
http://www.cs.rutgers.edu/˜mdstone
mdstone@cs.rutgers.edu
Abstract
This paper presents a simple and ver-
satile tree-rewriting lexicalized grammar
formalism, TAGLET, that provides an ef-
fective scaffold for introducing advanced
topics in a survey course on natural lan-
guage processing (NLP). Students who
implement a strong competence TAGLET
parser and generator simultaneously get
experience with central computer science
ideas and develop an effective starting
point for their own subsequent projects in
data-intensive and interactive NLP.
1 Introduction
This paper is particularly addressed to readers at in-
stitutions whose resources and organization rule out
extensive formal course-work in natural language
processing (NLP). This is typical at universities in
North America. In such places, NLP teaching must
be ambitious but focused; courses must quickly ac-
quaint a broad range of students to the essential
concepts of the field and sell them on its current
research opportunities and challenges. This paper
presents one resource that may help. Specifically,
I outline a simple and versatile lexicalized formal-
ism for natural language syntax, semantics and prag-
matics, called TAGLET, and draw on my experi-
ence with CS 533 (NLP) at Rutgers to motivate the
potential role for TAGLET in a broad NLP class
whose emphasis is to introduce topics of current re-
search. Notes, assignments and implementations for
TAGLET are available on the web.
I begin in Section 2 by describing CS 533—
situating the course within the university and outlin-
ing its topics, audience and goals. I then describe the
specific goals for teaching and implementing gram-
mar formalisms within such a course, in Section 3.
Section 4 gives an informal overview of TAGLET,
and the algorithms, specifications and assignments
that fit TAGLET into a broad general NLP class.
In brief, TAGLET is a context-free tree-rewriting
formalism, defined by the usual complementation
operation and the simplest imaginable modifica-
tion operation. By implementing a strong compe-
tence TAGLET parser and generator students simul-
taneously get experience with central computer sci-
ence ideas—data structures, unification, recursion
and abstraction—and develop an effective starting
point for their own subsequent projects. Two note-
worthy directions are the construction of interac-
tive applications, where TAGLET’s relatively scal-
able and reversible processing lets students easily
explore cutting-edge issues in dialogue semantics
and pragmatics, and the development of linguistic
specifications, where TAGLET’s ability to lexical-
ize tree-bank parses introduces a modern perspec-
tive of linguistic intuitions and annotations as pro-
grams. Section 5 briefly summarizes the advantages
of TAGLET over the many alternative formalisms
that are available; an appendix to the paper provides
more extensive technical details.
2 CS 533
NLP at Rutgers is taught as part of the graduate ar-
tificial intelligence (AI) sequence in the computer
science department. As a prerequisite, computer sci-
ence students are expected to be familiar with prob-
                     July 2002, pp. 77-84.  Association for Computational Linguistics.
              Natural Language Processing and Computational Linguistics, Philadelphia,
         Proceedings of the Workshop on Effective Tools and Methodologies for Teaching
abilistic and decision-theoretic modeling (including
statistical classification, hidden Markov models and
Markov decision processes) from the graduate-level
AI foundations class. They might take NLP as a pre-
liminary to research in dialogue systems or in learn-
ing for language and information—or simply to ful-
fill the breadth requirement of MS and PhD degrees.
Students from a number of other departments fre-
quently get involved in natural language research,
however, and are also welcome in 533; on average,
only about half the students in 533 come from com-
puter science. Students from the linguistics depart-
ment frequently undertake computational work as
a way of exploring practical learnability as a con-
straint on universal grammar, or practical reasoning
as a constraint on formal semantics and pragmatics.
The course also attracts students from Rutgers’s li-
brary and information science department, its pri-
mary locus for research in information retrieval and
human-computer interaction. Ambitious undergrad-
uates can also take 533 their senior year; most par-
ticipate in the interdisciplinary cognitive science un-
dergraduate major. 533 is the only computational
course in natural language at Rutgers.
Overall, the course is structured into three mod-
ules, each of which represents about fifteen hours of
in-class lecture time.
The first module gives a general overview of lan-
guage use and dialogue applications. Lectures fol-
low (Clark, 1996), but instill the practical method-
ology for specifying and constructing knowledge-
based systems, in the style of (Brachman et al.,
1990), into the treatment of communication. Con-
currently, students explore precise descriptions of
their intuitions about language and communication
through a series of short homework exercises.
The second module focuses on general techniques
for linguistic representation and implementation, us-
ing TAGLET. With an extended TAGLET project,
conveniently implemented in stages, we use basic
tree operations to introduce Prolog programming,
including data structures, recursion and abstraction
much as outlined in (Sterling and Shapiro, 1994);
then we write a simple chart parser with incremental
interpretation, and a simple communicative-intent
generator scaled down after (Stone et al., 2001).
The third module explores the distinctive prob-
lems of specific applications in NLP, including spo-
ken dialogue systems, information retrieval and text
classification, spelling correction and shallow tag-
ging applications, and machine translation. Jurafsky
and Martin (2000) is our source-book. Concurrently,
students pursue a final project, singly or in cross-
disciplinary teams, involving a more substantial and
potentially innovative implementation.
In its overall structure, the course seems quite
successful. The initial emphasis on clarifying in-
tuitions about communication puts students on an
even footing, as it highlights important ideas about
language use without too much dependence on spe-
cialized training in language or computation. By the
end of the class, students are able to build on the
more specifically computational material to come up
with substantial and interesting final projects. In
Spring 2002 (the first time this version of 533 was
taught), some students looked at utterance interpre-
tation, response generation and graphics generation
in dialogue interaction; explored statistical methods
for word-sense disambiguation, summarization and
generation; and quantified the potential impact of
NLP techniques on information tasks. Many of these
results represented fruitful collaborations between
students from different departments.
Naturally, there is always room for improvement,
and the course is evolving. My presentation of
TAGLET here, for example, represents as much a
project for the next run of 533 as a report of this
year’s materials; in many respects, TAGLET actu-
ally emerged during the semester as a dynamic reac-
tion to the requirements and opportunities of a six-
week module on general techniques for linguistic
representation and implementation.
3 Language and Computation in NLP
In a survey course for a broad, research-oriented au-
dience, like CS 533 at Rutgers, a module on linguis-
tic representation must orient itself to central ideas
about computation. 533 may be the first and last
place linguistics or information science students en-
counter concepts of specification, abstraction, com-
plexity and search in class-work. The students who
attack interdisciplinary research with success will be
the ones who internalize and draw on these concepts,
not those who merely hack proficiently. At the same
time, computer scientists also can benefit from an
emphasis on computational fundamentals; it means
that they are building on and reinforcing their ex-
pertise in computation in exploring its application to
language. Nevertheless, NLP is not compiler con-
struction. Programming assignments should always
underline a worthwhile linguistic lesson, not indulge
in implementation for its own sake.
This perspective suggests a number of desiderata
for the grammar formalism for a survey course in
NLP.
Tree rewriting. Students need to master recur-
sive data-structures and programming. NLP directs
our attention to the recursive structures of linguistic
syntax. In fact, by adopting a grammar formalism
whose primitives operate on these structures as first-
class objects, we can introduce a rich set of relatively
straightforward operations to implement, and moti-
vate them by their role in subsequent programs.
Lexicalization. Students need to distinguish be-
tween specification and implementation, and to un-
derstand the barriers of abstraction that underlie
the distinction. Lexicalized grammars come with a
ready notion of abstraction. From the outside, ab-
stractly, a lexicalized grammar analyzes each sen-
tence as a simple combination of atomic elements
from a lexicon of options. Simultaneously, a con-
crete implementation can assign complex structures
to the atomic elements (elementary trees) and imple-
ment complex combinatory operations.
Strong competence implementation. Students
need to understand how natural language must and
does respond to the practical logic of physical re-
alization, like all AI (Agre, 1997). Mechanisms that
use grammars face inherent computational problems
and natural grammars in particular must respond to
these problems: students should undertake imple-
mentations which directly realize the operations of
the grammar in parsing and generation. But these
must be effective programs that students can build
on—our time and interest is too scarce for extensive
reimplementations.
Simplicity. Where possible, linguistic proposals
should translate readily to the formalism. At the
same time, students should be able to adapt aspects
of the formalism to explore their own judgments
and ideas. Where possible, students should get in-
tuitive and satisfying results from straightforward
algorithms implemented with minimal bookkeeping
and case analysis. At the same time, there is no rea-
son why the formalism should not offer opportuni-
ties for meaningful optimization.
We cannot expect any formalism to fare perfectly
by all these criteria—if any does, it is a deep fact
about natural language! Still, it is worth remark-
ing just how badly these criteria judge traditional
unification-based context-free grammars (CFGs), as
presented in say (Pereira and Shieber, 1987). Data-
structures are an afterthought in CFGs; CFGs can-
not in principle be lexicalized; and, whatever their
merits in parsing or recognition, CFGs set up a pos-
itively abysmal search space for meaningful genera-
tion tasks.
4 TAGLET
TAGLET
1
is my response to the objectives mo-
tivated in Section 2 and outlined in Section 3.
TAGLET represents my way of distilling the essen-
tial linguistic and computational insights of lexical-
ized tree-adjoining grammar—LTAG (Joshi et al.,
1975; Schabes, 1990)—into a form that students can
easily realize in end-to-end implementations.
4.1 Overview
Like LTAG, TAGLET analyzes sentences as a com-
plex of atomic elements combined by two kinds of
operations, complementation and modification.Ab-
stractly, complementation combines a head with an
argument which is syntactically obligatory and se-
mantically dependent on the head. Abstractly, mod-
ification combines a head with an adjunct which is
syntactically optional and need not involve any spe-
cial semantic dependence. Crucially for generation,
in a derivation, modification and complementation
operations can apply to a head in any order, often
yielding identical structures in surface syntax. This
means the generator can provide required material
first, then elaborate it, enabling use of grammar in
high-level tasks such as the planning of referring ex-
pressions or the “aggregation” of related semantic
material into a single complex sentence.
Concretely, TAGLET operations are implemented
by operations that rewrite trees. Each lexical el-
ement is associated with a fragmentary phrase-
1
If the acronym must stand for something, “Tree Assembly
Grammar for LExicalized Teaching” will do.
C
T
+
T’
C
)
T’
C
T
Figure 1: Substitution (complementation).
C
T
+
T’
C’
C*
)
T’
C’
C
T
Figure 2: Forward sister-adjunction (modification.)
structure tree containing a distinguished word called
the anchor. For complementation, TAGLET adopts
TAG’s substitution operation; substitution replaces
a leaf node in the head tree with the phrase struc-
ture tree associated with the complement. See Fig-
ure 1. For modification, TAGLET adopts the the
sister-adjunction operation defined in (Rambow et
al., 1995); sister-adjunction just adds the modifier
subtree as a child of an existing node in the head
tree—either on the left of the head (forward sister-
adjunction) as in Figure 2, or on the right of the head
(backward sister-adjunction). I describe TAGLET
formally in Appendix A.
TAGLET is equivalent in weak generative power
to context-free grammar. That is, any language de-
fined by a TAGLET also has a CFG, and any lan-
guage defined by a CFG also has a TAGLET. On the
other hand context-free languages can have deriva-
tions in which all lexical items are arbitrarily far
from the root; TAGLET derived structures always
have an anchor whose path to the root of the sen-
tence has a fixed length given by a grammatical ele-
ment. See Appendix B. The restriction seems of lit-
tle linguistic significance, since any tree-bank parse
induces a unique TAGLET grammar once you la-
bel which child of each node is the head, which are
complements and which are modifiers. Indeed, since
TAGLET thus induces bigram dependency struc-
tures from trees, this invites the estimation of proba-
bility distributions on TAGLET derivations based on
NP
Chris
S
 
 
H
H
NP VP
 
 
H
H
V
loves
NP
NP
Sandy
VP*
n
ADVP
madly
Figure 3: Parallel analysis in TAGLET and TAG.
observed bigram dependencies; see (Chiang, 2000).
To implement an effective TAGLET generator,
you can perform a greedy head-first search of deriva-
tions guided by heuristic progress toward achieving
communicative goals (Stone et al., 2001). Mean-
while, because TAGLET is context-free, you can
easily write a CKY-style dynamic programming
parser that stores structures recognized for spans of
text in a chart, and iteratively combines structures
in adjacent spans until the analyses span the entire
sentence. (More complexity would be required for
multiply-anchored trees, as they induce discontinu-
ous constituents.) The simple requirement that op-
erations never apply inside complements or modi-
fiers, and apply left-to-right within a head, suffices
to avoid spurious ambiguity. See Appendix C.
4.2 Examples
With TAGLET, two kinds of examples are instruc-
tive: those where TAGLET can mirror TAG, and
those where it cannot. For the first case, consider
an analysis of Chris loves Sandy madly by the trees
of Figure 3. The final structure is:
S
 
 
 
 
H
H
H
H
NP
Chris
VP
 
 
 
 
H
H
H
H
V
loves
NP
Sandy
ADVP
madly
For the second case, consider the embedded ques-
tion who Chris thinks Sandy likes. The usual TAG
analysis uses the full power of adjunction. TAGLET
requires the use of one of the familiar context-free
filler-gap analyses, as perhaps that suggested by the
trees in Figure 4, and their composition:
Q
 
 
H
H
NP
who
S/NP
NP
Chris
S/NP
 
 
 
H
H
H
NP VP/NP
 
 
H
H
V
thinks
S/NP
NP
Sandy
S/NP
 
 
H
H
NP VP/NP
V
likes
Figure 4: TAGLET requires a gap-threading analy-
sis of extraction (or another context-free analysis).
Q
 
 
 
 
H
H
H
H
NP
who
S/NP
 
 
 
 
H
H
H
H
NP
Chris
VP/NP
 
 
 
H
H
H
V
thinks
S/NP
 
 
H
H
NP
Sandy
VP/NP
V
likes
The use of syntactic features amounts to an in-
termediate case. In TAGLET derivations (unlike in
TAG) nodes accrete children during the course of a
derivation but are never rewritten or split. Thus, we
can decorate any TAGLET node with a single set
of syntactic features that is preserved throughout the
derivation. Consider the trees for he knows below:
NP
 
NM SG
CS X
 
/he/
S
 
 
 
 
H
H
H
H
NP
 
NM Y
CS N
 
VP
V
 
NM Y
 
/know/
When these trees combine, we can immediately
unify the number Y of the verb with the pronoun’s
singular; we can immediately unify the case X of the
pronoun with the nominative assigned by the verb:
S
 
 
 
 
H
H
H
H
NP
 
NM SG
CS N
 
/he/
VP
V
 
NM SG
 
/know/
The feature values will be preserved by further steps
of derivation.
4.3 Building on TAGLET
Semantics and pragmatics are crucial to NLP.
TAGLET lets students explore meaty issues in se-
mantics and pragmatics, using the unification-based
semantics proposed in (Stone and Doran, 1997). We
view constituents as referential, or better, indexical;
we link elementary trees with constraints on these
indices and conjoin the constraints in the meaning
of a compound structure. This example shows how
the strategy depends on a rich ontology:
S:e
 
 
 
 
H
H
H
H
NP:c
Chris
VP:e
 
 
 
 
 
H
H
H
H
H
V:e
loves
NP:s
Sandy
ADVP:e
madly
chris(c)^sandy(s)^love(e;c;s)^mad(e)
The example also shows how the strategy lets us
quickly implement, say, the constraint-satisfaction
approaches to reference resolution or the plan-
recognition approaches to discourse integration de-
scribed in (Stone and Webber, 1998).
4.4 Lectures and Assignments
Here is a plan for a six-week TAGLET module. The
first two weeks introduce data structures and recur-
sive programming in Prolog, with examples drawn
from phrase structure trees and syntactic combi-
nation; and discuss dynamic-programming parsers,
with an aside on convenient implementation using
Prolog assertion. As homework, students implement
simple tree operations, and build up to definitions of
substitution and modification for parsing and gener-
ation; they use these combinatory operations to write
a CKY TAGLET parser.
The next two weeks begin with lectures on the
lexicon, emphasizing abstraction on the computa-
tional side and the idiosyncrasy of lexical syntax and
the indexicality of lexical semantics on the linguis-
tic side; and continue with lectures on semantics and
interpretation. Meanwhile, students add reference
resolution to the parser, and implement routines to
construct grammars from tree-bank parses.
The final two weeks cover generation as problem-
solving, and search through the grammar. Students
reuse the grammar and interpretation model they al-
ready have to construct a generator.
5Conclusion
Important as they are, lexicalized grammars can
be forbidding. Versions of TAG and combinatory
categorial grammars (CCG) (Steedman, 2000), as
presented in the literature, require complex book-
keeping for effective computation. When I wrote
a CCG parser as an undergraduate, it took me a
whole semester to get an implemented handle on
the metatheory that governs the interaction of (cross-
ing) composition or type-raising with spurious am-
biguity; I still have never written a TAG parser or a
CCG generator. Variants of TAG like TIG (Schabes
and Waters, 1995) or D-Tree grammars (Rambow
et al., 1995) are motivated by linguistic or formal
considerations rather than pedagogical or computa-
tional ones. Other formalisms come with linguistic
assumptions that are hard to manage. Link gram-
mar (Sleator and Temperley, 1993) and other pure
dependency formalisms can make it difficult to ex-
plore rich hierarchical syntax and the flexibility of
modification; HPSG (Pollard and Sag, 1994) comes
with a commitment to its complex, rather bewilder-
ing regime for formalizing linguistic information as
feature structures. Of course, you probably could
refine any of these theories to a simple core—and
would get something very like TAGLET.
I strongly believe that this distillation is worth
the trouble, because lexicalization ties grammar for-
malisms so closely to the motivations for studying
language in the first place. For linguistics, this phi-
losophy invites a fine-grained description of sen-
tence syntax, in which researchers document the di-
versity of linguistic constructions within and across
languages, and at the same time uncover impor-
tant generalizations among them. For computation,
this philosophy suggests a particularly concrete ap-
proach to language processing, in which the infor-
mation a system maintains and the decisions it takes
ultimately always just concern words. In taking
TAGLET as a starting point for teaching implemen-
tation in NLP, I aim to expose a broad range of stu-
dents to a lexicalized approach to the cognitive sci-
ence of human language that respects and integrates
both linguistic and computational advantages.
Acknowledgments
Thanks to the students of CS 533 and four anony-
mous reviewers for helping to disabuse me of nu-
merous preconceptions.

References
Philip E. Agre. 1997. Computation and Human Experi-
ence. Cambridge.
Ronald Brachman, Deborah McGuinness, Peter Pa-
tel Schneider, Lori Alperin Resnick, and Alexander
Borgida. 1990. Living with CLASSIC: when and how
to use a KL-ONE-like language. In J. Sowa, editor,
Principles of Semantic Networks. Morgan Kaufmann.
David Chiang. 2000. Statistical parsing with an
automatically-extracted tree adjoining grammar. In
ACL, pages 456–463.
Herbert H. Clark. 1996. Using Language. Cambridge
University Press, Cambridge, UK.
John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ull-
man. 2000. Introduction to automata theory, lan-
guages and computation. Addison-Wesley, second
edition.
Aravind K. Joshi, L. Levy, and M. Takahashi. 1975. Tree
adjunct grammars. Journal of the Computer and Sys-
tem Sciences, 10:136–163.
Daniel Jurafsky and James H. Martin. 2000. Speech
and Language Processing: An introduction to nat-
ural language processing, computational linguistics
and speech recognition. Prentice-Hall.
Fernando C. N. Pereira and Stuart M. Shieber. 1987.
Prolog and Natural Language Analysis. CSLI, Stan-
ford CA.
Carl Pollard and Ivan A. Sag. 1994. Head-Driven Phrase
Structure Grammar. University of Chicago Press,
Chicago.
Owen Rambow, K. Vijay-Shanker, and David Weir.
1995. D-Tree grammars. In ACL, pages 151–158.
Yves Schabes and Richard C. Waters. 1995. Tree-
insertion grammar: A cubic-time parsable formalism
that lexicalizes context-free grammar without chang-
ing the trees produced. Computational Linguistics,
21:479–513.
Yves Schabes. 1990. Mathematical and Computational
Aspects of Lexicalized Grammars. Ph.D. thesis, Com-
puter Science Department, University of Pennsylva-
nia.
Daniel Sleator and Davy Temperley. 1993. Parsing
English with a link grammar. In Third International
Workshop on Parsing Technologies.
Mark Steedman. 2000. The Syntactic Process.MIT.
Leon Sterling and Ehud Shapiro. 1994. The Art of Pro-
log. MIT, second edition.
Matthew Stone and Christine Doran. 1997. Sentence
planning as description using tree-adjoining grammar.
In Proceedings of ACL, pages 198–205.
Matthew Stone and Bonnie Webber. 1998. Textual
economy through close coupling of syntax and seman-
tics. In Proceedings of International Natural Lan-
guage Generation Workshop, pages 178–187.
Matthew Stone, Christine Doran, Bonnie Webber, Tonia
Bleam, and Martha Palmer. 2001. Microplanning
with communicative intentions: The SPUD system.
Under review.
