File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/95/m95-1004_concl.xml

Size: 1,706 bytes

Last Modified: 2025-10-06 13:57:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="M95-1004">
  <Title>STATISTICAL SIGNIFICANCE OF MUC-6 RESULT S</Title>
  <Section position="4" start_page="39" end_page="39" type="concl">
    <SectionTitle>
CONCLUSION S
</SectionTitle>
    <Paragraph position="0"> The groupings in these tables allow an ordering that is less clean than we would like, but that is realistic a t this point in the evaluation methodology research . In addition to looking at the scores, evaluation research on a mor e granular level is needed to understand the differences in the systems' performance . Such research could revea l strengths and weaknesses in extracting certain information and lead to test designs that focus research in areas tha t will directly impact operational value . Also, other factors that are of interest to consumers, such as speed , development data requirements, and so on, need to be considered when making comprehensive comparisons o f systems.</Paragraph>
    <Paragraph position="1"> The entire community would benefit from more refined measured values and a better understanding of how the differences in human performance influence the results . Distinguishing systems at such a strict cutoff as we use i n the statistics may only be justified if variations in human performance are smaller . After all, it is the human interpretation of the task definitions that informs the systems during development . Especially in Named Entity where machine performance and human performance are close, we would expect to see inherent human differences i n interpreting language during both system and answer key development to be a considerable factor holding th e machines back.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML