File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/h92-1115_metho.xml
Size: 5,115 bytes
Last Modified: 2025-10-06 14:13:14
<?xml version="1.0" standalone="yes"?> <Paper uid="H92-1115"> <Title>RESEARCH IN NATURAL LANGUAGE PROCESSING</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> RESEARCH IN NATURAL LANGUAGE PROCESSING </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> PROJECT GOALS </SectionTitle> <Paragraph position="0"> Our general objective has been the enhancement of techniques for extracting information and retrieving documents from natural language text. Our focus has been on methods which automatically learn syntactic and semantic properties of the language used in particular domains, and on techniques for enhancing the robustness of natural language analyzers. This work covers a number of areas; we summarize our accomplishments and plans for each area below.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> INFORMATION EXTRACTION </SectionTitle> <Paragraph position="0"> A major portion of our effort over the past year was devoted to participation in the Third Message Understanding Conference (MUC-3). As the messages involved (newspaper reports of terrorist activity) were significantly more complex than those for MUC-2, a number of enhancements were required to our information extraction system. These included extensions to our grammar, use of a commercial machine-readable dictionary (the Oxford Advanced Learner's Dictionary) as our primary source of lexical information, and several additional techniques for recovery in the event that an entire sentence cannot be analyzed syntactically or semantically. The enhancements were described in detail in the MUC-3 Proceedings. Substantial effort was also required to develop a semantic model of the terrorist domain. The performance of the resulting system compared favorably to others participating in MUC-3. We are currently developing further system enhancements for participation in MUC-4 in June of 1992.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> PARSER EVALUATION </SectionTitle> <Paragraph position="0"> To assess alternative techniques for improving parsing performance, we have applied a metric of parsing quality suggested by Ezra Black. Using this metric, we compared parses produced by the Univ. of Pennsylvania Tree Bank for a portion of the MUC-3 corpus against those produced by our system (with some automatic restructuring of our parses to ~ improve their alignment with the Tree Bank). We evaluated a number of methods for improving parser performance, including fitted parses, closest attachment of modifiers, hypothesis merging, stochastic grammars, and stochastic part of speech taggers. Most of these results will be reported at the Third Conf. on Applied Natural Language Processing.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> ACQUISITION OF SELECTIONAL PATTERNS </SectionTitle> <Paragraph position="0"> We performed a syntactic analysis of the MUC-3 corpus (automatically, without selectional constraints), and performed a frequency analysis of the co-occurrence patterns to identify the common semantic patterns. When these patterns were used as the basis for selectional constraints in further parsing, they were found to do slightly better than manually prepared constraints. We intend to extend this technique to substantially larger corpora in the coming year.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> DOCUMENT RETRIEVAL </SectionTitle> <Paragraph position="0"> We have continued our research on using syntactic analysis to enhance keyword-based document retrieval, both by identifying larger patterns than single words and by automatically discovering term similarity relations from word co-occurrence patterns in a large corpus.</Paragraph> <Paragraph position="1"> When applied to a small corpus of computer science abstracts, modest improvements in performance were demonstrated; these are reported in a separate paper in these proceedings. Over the coming year we intend to apply this technique to much larger corpora as part of the Text Retrieval Evaluation Conference.</Paragraph> </Section> <Section position="7" start_page="0" end_page="482" type="metho"> <SectionTitle> MULTI-LINGUAL SYSTEMS </SectionTitle> <Paragraph position="0"> We are continuing our work, sponsored jointly with the National Science Foundation, on Japanese-English sublanguage-based machine translation. We completed a small system for translating programming language texts, which also incorporated the reversible grammar technology we had previously developed. Over the next year we intend to perform initial experiments for discovering transfer rules from parallel bilingual corpora. We are also developing a Spanish version of our information extraction system, in order to better understand the problems of porting such a system across languages.</Paragraph> </Section> class="xml-element"></Paper>