RESEARCH IN NATURAL LANGUAGE PROCESSING 
Ralph Grishman, Principal Investigator 
Department of Computer Science 
New York University 
New York, NY 10003 
PROJECT GOALS 
Our central research focus is on the automatic acquisition 
of knowledge about language (both syntactic and semantic) 
from corpora. We wish to understand how the knowledge so 
acquired can enhance natural language applications, includ- 
ing document retrieval, information extraction, and machine 
translation. In addition to experimenting with acquisition 
procedures, we are continuing to develop the infrastructure 
needed for these applications (gr~unmars and dictionaries, 
parsers, gr~unmar evalmdion procedures, etc.). 
The work on information retrieval and supporting technolo- 
gies (in particular, robust, fast parsing), directed by Tomek 
Strzalkowski, is described in a separate page in this section. 
RECENT ACCOMPLISHMENTS 
• Developed techniques for computing word similarities 
b~v;ed on the co-occurrence of words in the s~une (syn- 
tactic) contexts in a large corpus. Used these similari- 
ties to "smooth" ~mtomatically-acquired frequency &da 
on verb-argument and head-modifier co-occurrence, and 
demonstr~ded that the smoothing increases coverage of 
the patterns found in new texts. (This work fs described 
in a paper in this volume.) 
• Participated in Message Understanding Conference - 
4. Incorporated an enhanced time analysis module, an 
enhanced reference resolution module, and a stochas- 
tic part-of-speech tagger into our information extraction 
component, as well as making general improvements to 
the semantic models of descriptions of terrorist incidents. 
Demonstrated a significant improvement in performance 
over MUC-3. 
• In order to gain a better understanding of the problems 
involved in porting natural language systems to new do- 
mains, "translated" our MUC-3/MUC-4 system for ex- 
tracting information about terrorist incidents to process 
Spanish news reports. This required development of a 
relatively broad-coverage Spanish gr~unmar and adap- 
tation of the Collins Spanish-English machine-readable 
dictionary. 
Developed a prototype procedure for acquiring transfer 
rules from bilingual corpora through automatic align- 
merit of parse trees in the source and target languages. 
Developed specifications for a counmon, broad-coverage 
syntactic dictionary of English (COMLEX). 
Continued participation in a group to define common 
metrics for grammar evaluation. Applied these metrics 
to the output of two different NYU parsers (the Proteus 
parser and the Tagged Text Parser) analyzing a Wall 
Street Journal corpus. 
PLANS FOR THE COMING YEAR 
P~icipate in Message Understanding Conference - 5. 
Apply procedures for semantic pattern acquisition from 
corpora to speed the acquisition and broaden the cover- 
age of the patterns for the "joint-venture" domain. 
Continue work on semantic pattern acquisition proce- 
dures. Experiment with larger corpora, with alternative 
measures of word similarity, and with clustering proce- 
dures to identify semantic classes. 
407 
