File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/h94-1009_intro.xml
Size: 3,623 bytes
Last Modified: 2025-10-06 14:05:41
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1009"> <Title>The Hub and Spoke Paradigm for CSR Evaluation</Title> <Section position="3" start_page="40" end_page="40" type="intro"> <SectionTitle> 4. Summary </SectionTitle> <Paragraph position="0"> In the first trial of the Hub and Spoke evaluation paradigm in November, 1993, 11 research sites participated, including 5 sites outside the ARPA community. The result was a rich array of comparative and contrastive results on several important problems in large vocabulary CSR, all calibrated to the current state-of-the-art performance levels. A complete listing of the numerical results can be found in 121. For interpretive results, the interested reader should consult the comtemporary papers of the participating sites.</Paragraph> <Paragraph position="1"> %o cautions are in order when attempting to interpret these results. First, since the acoustic training and development test data were distributed quite late, and since the Hub and Spoke paradigm was under development up to two months prior to the evaluation, a considerable burden was imposed on the participants who were rushed through the data processing and system training steps and were often denied a complete understanding of the rules. Some anomalies in the results did occur due to these undesirable circumstances. null Secondly, it's important to remember that the only tests for which fair and informative direct comparisons can be made across systems (and sites) are the controlled C1 contrasts for either of the two Hub tests. All other tests are designed to produce informative comparisons only within a given system run in two contrastive modes. So in general, only intra-system comparisons should be made on the Spoke tests.</Paragraph> <Paragraph position="2"> The Hub and Spoke evaluation paradigm appears to have met the competing requirements of supporting the variety of important research interests within the ARPA CSR community while providing a mechanism to focus that work into well-defined and competitively charged evaluations of enabling technology. It is a flexible framework that encourages work in diverse problems in CSR.</Paragraph> <Paragraph position="3"> It is also a very structured framework that treats all tests conducted in an evaluation as if they were scientific experiments, specifying controls where appropriate to maximize the amount of information contained in the results. This structure also helps keep the effort of the participating research community focused around a productive core of problems. If the Hub and Spoke paradigm is to be truly successful, however, it will need to sustain that focus over time in a manner analogous to the very successful Resource Management based evaluations of the late 1980's.</Paragraph> <Paragraph position="4"> Contact Information The complete specification of the 1993 evaluation is contained in a documentation file included with the 1993 evaluation data distributed by NIST. It can also be obtained by sending a request to Francis Kubala via e-mail to flcubala@bbn.com.</Paragraph> <Paragraph position="5"> Sites that are interested in participating in future ARPA-sponsored CSR evaluations can notify David Pallett at the National Institute of Standards and Technology (NIST). E-mail: dave@jaguar.ncsl.nist.gov.</Paragraph> <Paragraph position="6"> The evaluation test data, as well as the training and development test data used in this evaluation, and all accompanying documentation, are available from the Linguistic Data Consortium (LDC). E-mail: Idc@unagi.cis.upenn.edu.</Paragraph> </Section> class="xml-element"></Paper>