File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/92/h92-1070_abstr.xml
Size: 3,724 bytes
Last Modified: 2025-10-06 13:47:34
<?xml version="1.0" standalone="yes"?> <Paper uid="H92-1070"> <Title>SESSION lOb: CORE NL LEXICON AND GRAMMAR</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> SESSION lOb: CORE NL LEXICON AND GRAMMAR </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> ABSTRACT </SectionTitle> <Paragraph position="0"> The value of generally-available speech and text corpora in facilitating research is clear--these resources are simply too expensive for each site to gather individually, and fair comparative evaluation of approaches requires shared training and testing material.</Paragraph> <Paragraph position="1"> Similar considerations apply in the case of lexicons and grammars for broad-coverage NLP applications. Lexicons and grammars are very expensive to build, and it seems wasteful for every site to have to build them over again, especially in cases where no new invention is intended. Also, fair comparative evaluation of other components in modular systems seems to require combining them with a shared lexicon and grammar.</Paragraph> <Paragraph position="2"> However, it is unclear whether the community of researchers can agree that a particular design is appropriate, and that a large-scale effort to implement it will converge on a result of fairly general utility. This session aimed to raise these issues, and to begin a discussion that we hope will result in a recommendation for action in the near futu~. The newlyformed Linguistic Data Consortium has a mandate to serve the community's needs in this area, and it will start from the ideas presented in this session by the panelists (Jerry Hobbs, Ralph Grishman, Paul Jacobs, Bob Ingria, Louise Guthrie, and Ken Church) and the audience.</Paragraph> <Paragraph position="3"> 1. Summary of Panelists' Presentations Six panelists made brief individual presentations. Three of these dealt with lexical issues.</Paragraph> <Paragraph position="4"> * Louise Guthrie (NMSU) presented a suggested from Yorick Wilks for a &quot;core lexicon&quot; that would be application-independent and easy to adapt to a wide range of parsers.</Paragraph> <Paragraph position="5"> * Bob Ingria (BBN) discussed his experience with the similarities and differences in the form and content of the lexical entries used by different systems.</Paragraph> <Paragraph position="6"> * Ken Church (AT&T), playing devil's advocate, argued that the effort to turn lexical raw materials into a shared computational lexicon might better be spent on more raw materials.</Paragraph> <Paragraph position="7"> Three other panelists dealt with grammatical issues.</Paragraph> <Paragraph position="8"> Jerry Hobbs (SKI) argued for a project to produce a &quot;National Resource Grammar,&quot; an extensible, efficient broad-coverage English grammar that could be distributed generally to the DARPA community and to other researchers.</Paragraph> <Paragraph position="9"> Ralph Grishman (NYU) suggested that we need to know where our grammars and parsers stand, and that an on-going program of comparative quantitative evaluation would define the state of the art, and should also lead to significant improvements.</Paragraph> <Paragraph position="10"> Paul Jacobs (GE) discussed his experience in combining components from earlier systems implemented at GE and CMU, doing the computationallinguistics equivalent of putting a Ford engine into a GM chassis; the fact that such &quot;transplants&quot; are feasible increases the credibility of of an effort to produce common lexical and grammatical components. null</Paragraph> </Section> class="xml-element"></Paper>