File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/w02-1015_concl.xml

Size: 2,152 bytes

Last Modified: 2025-10-06 13:53:24

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1015">
  <Title>Handling noisy training and testing data</Title>
  <Section position="7" start_page="9" end_page="9" type="concl">
    <SectionTitle>
6 Conclusion
</SectionTitle>
    <Paragraph position="0"> In this paper, we have given a new characterisation of the sorts of noise one nds in empirical NLP, and a roadmap for dealing with it in the future. For many of the problems in the eld, the state of the art is now su ciently advanced that evaluation error is becoming a signi cant factor in reported results; we show that it is correctable within the constraints of practicality and ethics.</Paragraph>
    <Paragraph position="1"> Although our examples all came from the Penn treebank, the taxonomy presented is applicable to  We did not run corrections on, nor do we show results for, Blaheta and Charniak's \misc&amp;quot; grouping, both because there were very many of them in the reported error list and because they are very frequently wrong in the treebank.</Paragraph>
    <Paragraph position="2"> any corpus annotation project. As long as there are typographical errors, there will be Type B errors; and unclear or counterintuitive guidelines will forever engender Type A and Type C errors. Furthermore, we expect that the experimental improvement shown in Section 5 will be reflected in projects on other annotated corpora|perhaps to a lesser or greater degree, depending on the di culty of the annotation task and the prior performance of the computer system. null An e ect of the continuing improvement of the state of the art is that researchers will begin (or have begun) concentrating on speci c subproblems, and will naturally report results on those subproblems. These subproblems are likely to involve the complicated cases, which are presumably also more subject to annotator error, and are certain to involve smaller test sets, thus increasing the performance effect of each individual misannotation. As the sizes of the subproblems decrease and their complexity increases, the ability to correct the evaluation corpus will become increasingly important.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML