File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/m93-1008_metho.xml

Size: 2,664 bytes

Last Modified: 2025-10-06 14:13:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="M93-1008">
  <Title>THE STATISTICAL SIGNIFICANCE OF THE MUC-5 RESULT S</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
STATISTICAL SIGNIFICANCE RESULT S
</SectionTitle>
    <Paragraph position="0"> Statistical significance results are reported here for the following metrics : Error per Response Fill, F-Mcasure with recall and precision weighted equally, and Richness-Normalized Error (minimum and maximum) . The systems are compared for the same domain and language and, thus, there are four figures for each metric : English Join t Ventures (EJV), Japanese Joint Ventures (JJV), English Microelectronics (EME), and Japanese Microelectronics (JME). The format of the reporting is according to the groupings of the systems which are not significantly differen t from each other at the 0.01 level with a confidence of at least 99% . Systems which are not significantly different from each other are underscored on the same line . The systems are numbered to save space and the correspondence of th e number and system site are given below the significance results .</Paragraph>
    <Paragraph position="1"> It is interesting to note that the rankings of systems do not change when using the Error per Response Fil l metric or the F-Measure. The numerical rankings change slightly (numbers 6 and 7 in EJV reverse, and numbers 4 and 5 in JJV reverse), but those changes are not significant statistically because the two members in each of th e reversed pairs are both in the same significance grouping for both of the two metrics . It is also interesting to note that the Error per Response Fill metric distinguishes four more systems than the F-Measure over all domains and languages. The Richness-Normalized Error metric distinguishes far fewer systems statistically than the Error pe r Response Fill metric with 29 systems distinguished by Richness-Normalized Error as opposed to 55 by Error per Response Fill for EJV alone. Both the minimum and maximum Richness-Normalized Error metrics produce the sam e rankings and statistical results so are conflated here . The statistical groupings of systems for Richness-Normalized Error are so large and so numerous that systems cannot be distinguished well enough to reflect their perceived differences in performance . It is believed that this is due to the fact that the Richness-Normalized Error metric ignores th e amount of spurious data generated by a system and that the amount and kind of spurious data generated impacts th e perception of how well the system is doing in an operational setting .</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML