File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/h93-1101_metho.xml
Size: 3,884 bytes
Last Modified: 2025-10-06 14:13:26
<?xml version="1.0" standalone="yes"?> <Paper uid="H93-1101"> <Title>RESEARCH IN NATURAL LANGUAGE PROCESSING</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> RESEARCH IN NATURAL LANGUAGE PROCESSING </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> PROJECT GOALS </SectionTitle> <Paragraph position="0"> Our central research focus is on the automatic acquisition of knowledge about language (both syntactic and semantic) from corpora. We wish to understand how the knowledge so acquired can enhance natural language applications, including document retrieval, information extraction, and machine translation. In addition to experimenting with acquisition procedures, we are continuing to develop the infrastructure needed for these applications (gr~unmars and dictionaries, parsers, gr~unmar evalmdion procedures, etc.).</Paragraph> <Paragraph position="1"> The work on information retrieval and supporting technologies (in particular, robust, fast parsing), directed by Tomek Strzalkowski, is described in a separate page in this section.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> RECENT ACCOMPLISHMENTS </SectionTitle> <Paragraph position="0"> * Developed techniques for computing word similarities b~v;ed on the co-occurrence of words in the s~une (syntactic) contexts in a large corpus. Used these similarities to &quot;smooth&quot; ~mtomatically-acquired frequency &da on verb-argument and head-modifier co-occurrence, and demonstr~ded that the smoothing increases coverage of the patterns found in new texts. (This work fs described in a paper in this volume.) * Participated in Message Understanding Conference 4. Incorporated an enhanced time analysis module, an enhanced reference resolution module, and a stochastic part-of-speech tagger into our information extraction component, as well as making general improvements to the semantic models of descriptions of terrorist incidents.</Paragraph> <Paragraph position="1"> Demonstrated a significant improvement in performance over MUC-3.</Paragraph> <Paragraph position="2"> * In order to gain a better understanding of the problems involved in porting natural language systems to new domains, &quot;translated&quot; our MUC-3/MUC-4 system for extracting information about terrorist incidents to process Spanish news reports. This required development of a relatively broad-coverage Spanish gr~unmar and adaptation of the Collins Spanish-English machine-readable dictionary.</Paragraph> <Paragraph position="3"> Developed a prototype procedure for acquiring transfer rules from bilingual corpora through automatic alignmerit of parse trees in the source and target languages. Developed specifications for a counmon, broad-coverage syntactic dictionary of English (COMLEX).</Paragraph> <Paragraph position="4"> Continued participation in a group to define common metrics for grammar evaluation. Applied these metrics to the output of two different NYU parsers (the Proteus parser and the Tagged Text Parser) analyzing a Wall Street Journal corpus.</Paragraph> </Section> <Section position="4" start_page="0" end_page="407" type="metho"> <SectionTitle> PLANS FOR THE COMING YEAR </SectionTitle> <Paragraph position="0"> P~icipate in Message Understanding Conference - 5.</Paragraph> <Paragraph position="1"> Apply procedures for semantic pattern acquisition from corpora to speed the acquisition and broaden the coverage of the patterns for the &quot;joint-venture&quot; domain. Continue work on semantic pattern acquisition procedures. Experiment with larger corpora, with alternative measures of word similarity, and with clustering procedures to identify semantic classes.</Paragraph> </Section> class="xml-element"></Paper>