File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/h94-1109_metho.xml
Size: 4,493 bytes
Last Modified: 2025-10-06 14:13:56
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1109"> <Title>RESEARCH IN NATURAL LANGUAGE PROCESSING</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> RESEARCH IN NATURAL LANGUAGE PROCESSING </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> PROJECT GOALS </SectionTitle> <Paragraph position="0"> Our central research focus is on the automatic acquisition of knowledge about language (both syntactic and semantic) from corpora. We wish to understand how the knowledge so acquired can enhance natural language applications, including document retrieval, information extraction, and machine translation. In addition to experimenting with acquisition procedures, we are continuing to develop the infrastructure needed for these applications (grammars and dictionaries, parsers, evaluation procedures, etc.).</Paragraph> <Paragraph position="1"> The work on information retrieval and supporting technologies (in particular, robust, fast parsing), directed by Tomek Strzalkowski, is described in a separate page in this section, as well as a paper in this volume.</Paragraph> </Section> <Section position="3" start_page="0" end_page="466" type="metho"> <SectionTitle> RECENT ACCOMPLISHMENTS </SectionTitle> <Paragraph position="0"> * Extended earlier work on the acquisition of semantic patterns from syntactically-analyzed corpora, and on the generalization of these patterns using word similarity measures obtained from the corpora. Measured the coverage of the collected patterns as a function of corpus size, and compared this with an analytic model for such coverage.</Paragraph> <Paragraph position="1"> * Participated in Message Understanding Conference - 5. Substantially extended our lexical preprocessor to identify company names, people's names, locations, etc.</Paragraph> <Paragraph position="2"> Added an acquisition tool for lexico-semantic models, which allows users to specify correspondences between lexical and semantic structures through example sentences. null * Organized meeting for planning of Message Understanding Conference - 6. Coordinated efforts for developing the different corpus annotations which will be required. (These plans and annotations are described in a separate paper in this volume.) * Developed improved procedures for the alignment of syntactic structures in sentences drawn from parallel bilingual corpora. The goal of this effort is to automatically learn transfer rules for a machine translation system from a bilingual corpus; the starting point is an (incomplete) set of word correspondences from a bilingual dictionary. Demonstrated (using a small Spanish-English corpus) that an iterative algorithm, which uses initial alignments to obtain additional correspondences between words and between grammatical roles, can yield better final alignments. (This work is also supported by the National Science Foundation.) Continued studies of appropriate feature structures for a common, broad-coverage syntactic dictionary of English (COMLEX). This work complemented the ongoing effort for creation of COMLEX, which is being supported by ARPA through the Linguistic Data Consortium. (The work on COMLEX is described in a separate paper in this volume.)</Paragraph> </Section> <Section position="4" start_page="466" end_page="466" type="metho"> <SectionTitle> PLANS FOR THE COMING YEAR </SectionTitle> <Paragraph position="0"> * Extend earlier work on stocastic grammars for parsing: experiment with alternative word contexts for use in computing conditional probabilities; experiment with alternative search algorithms to obtain speed/preecision trade-offs.</Paragraph> <Paragraph position="1"> * Continue work on semantic pattern acquisition procedures. Experiment with alternative measures of word similarity for use in generalizing patterns extracted from corpora.</Paragraph> <Paragraph position="2"> * Continue planning for MUC-6. Coordinate efforts to develop specifications and annotated corpora for named entities, predicate-argument structure, coreference, and word sense information; to develop scoring rules for the different evaluations; and to define tasks for MUC-6 dry run in Fall 1994.</Paragraph> <Paragraph position="3"> Apply bilingual alignment algorithm to larger corpora. Develop generalization algorithms for transfer rules extracteed from bilingual corpus.</Paragraph> </Section> class="xml-element"></Paper>