File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/h94-1022_intro.xml
Size: 2,358 bytes
Last Modified: 2025-10-06 14:05:42
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1022"> <Title>SEMANTIC EVALUATION FOR SPOKEN-LANGUAGE SYSTEMS</Title> <Section position="4" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. INTRODUCTION </SectionTitle> <Paragraph position="0"> Since the summer of 1993, there has been considerable discussion in the ARPA HLT community of moving the evaluation of understanding systems for both spoken and written language away from application-based metrics (such as correct database response in ATIS, or template fills in MUG) towaxd technology-based metrics. The benefits hoped to be derived from such a shift include greater focus on underlying technology issues, rather than application issues, and lowering the overhead required to participate in evaluations in terms of developing application systems. The discussions have focused on the concept of a semantic evaluation, or &quot;SemEval&quot; consisting of three components: word-sense identification, predicate-argument structure determination, and coreference determination. This paper reports on how these ideas are being developed within the ARPA spoken-language community, in preparation for an initial spoken-language SemEval on ATIS data concurrent with the ATIS GAS (database answer) evaluation planned for November/December 1994.</Paragraph> <Paragraph position="1"> A meeting was held at SRI, 21-23 October 1993, to begin fleshing out these ideas for the evaluation of spoken-language understanding systems. The meeting was attended by researchers, annotators, and evaluators involved in both the ARPA Spoken Language Program and the ARPA Written Language Program: Fernando Pereira from AT&T Bell Laboratories; Rusty Bobrow and Dave Stallaxd from BBN; Wayne Ward and Sergei Nirenburg from CMU; Stephanie Seneff and Eric Brill from MIT; Robert Moore, Kate Hunicke-Smith, Jerry Hobbs, Harry Bratt, and Mark Gawron from SRI; Debhie Dald and Lew Norton from Unisys; Mitch Marcus and Grace Kim from the University of Pennsylvania; Nancy Chinchor from SAIG; George Doddingtou from ARPA; George Miller from Princeton University; Dave Pallett and Bruce Lurid from NIST; and Ralph Grishman from NYU. This paper is derived from the discussions at the October meeting and from subsequent proposals made and discussed by the participants.</Paragraph> </Section> class="xml-element"></Paper>