Natural Language Research 
PIs: Aravind Joshi, Mitch Marcus, Mark Steedman, and Bonnie Webber 
Department of Computer and Information Science 
University of Pennsylvania 
Philadelphia, PA 19104 
emaihjoshi@cis.upenn.edu 
OBJECTIVE 
The main objective is basic research and system devel- 
opment leading to (1) characterization of information 
carried by (a) syntax, semantics, and discourse struc- 
ture, (b) their relation to information carried by into- 
nation, and (c) development of methods for using this 
information for generation and understanding; (2) devel- 
opment of architectures for integration of utterance plan- 
ning with lexical, syntactic and intonational choice; (3) 
development of incremental strategies for using syntac- 
tic, semantic, and pragmatic knowledge in understand- 
ing and generating language. 
RECENT ACCOMPLISHMENTS 
• An algorithm was designed based on Earley's 
parser for estimating the parameters of a stochas- 
tic context-free grammar. Contrary to other ap- 
proaches, this algorithm does not require that the 
grammar is in a normal form. 
• A new predictive left-to-right parser for TAG 
was designed and included in a software package 
(XTAG). 
• An X-based Graphical Interface for Tree-Adjoining 
Grammars (XTAG) has been released for distribu- 
tion. This software package includes: (1) a graphi- 
cal editor for trees; (2) a parser for unification-based 
tree-adjoining grammars; (3) utilities for defining 
grammars and lexicon for tree-adjoining grammars; 
and (4) a user manual. 
• The notion of stochastic tree-adjoining grammars 
was defined and an algorithm for estimating from a 
corpus the probabilities of a stochastic TAG was de- 
signed. Lexicalized tree adjoining grammar (LTAG) 
provides a stochastic model that is both hierarchical 
and sensitive to lexical information. 
• Developed a new notion of derivation for the tree 
adjoining grammars, which is sensitive to the dis- 
tinction between modifier and predicational auxil- 
iary trees. This distinction is relevant to the design 
of probabilistic LTAGs. 
490 
• Developed a new formalism, structure unification 
grammar, that allows many of the key insights of 
a variety of grammatical formalisms to be brought 
to together in one framework, although at a cost of 
some increased computational complexity. 
• The Pereira-Pollack approach to incremental inter- 
pretation was extended to support a discourse-based 
algorithm for resolving verb phrase ellipsis. 
PLANS FOR THE COMING YEAR 
• Continue work on automatic extraction of linguis- 
tic structure, extending work on determination of 
part-of-speech tag sets and adding morphophone- 
mic rules to the morphology algorithm, focusing 
on automatically discovering high-level grammati- 
cal structure. 
• Extend the techniques used for the design of poly- 
nomial time and space shift-reduce parsers for arbi- 
trary context-free grammars to tree adjoining gram- 
mars. 
• Complete the work on stochastic tree-adjoining 
grammars, implement an algorithm for estimating 
from a corpus the probabilities of a stochastic TAG, 
and investigate the design of algorithms for using 
parsed corpora such as the Penn Treebank as the 
basis for the estimation of stochastic tree-adjoining 
grammars. 
• Complete the work on the new derivation for 
LTAGs based on the distinction between modifier 
and predicational auxiliary trees and integrate this 
formulation in the framework of stochastic TAGs. 
• Complete the integration of coordination in the tree 
adjoining grammar framework. 
• Begin work on the problem of word-order variation, 
which is more common in languages such as Ger- 
man, Korean, Japanese, among others. 
