File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/a00-2020_concl.xml

Size: 1,928 bytes

Last Modified: 2025-10-06 13:52:38

<?xml version="1.0" standalone="yes"?>
<Paper uid="A00-2020">
  <Title>Detecting Errors within a Corpus using Anomaly Detection</Title>
  <Section position="6" start_page="151" end_page="152" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> This paper presents a fully automatic method for detecting errors in corpora using anomaly detection techniques. As shown, the anomalies detected in the Penn Treebank corpus tend to be tagging errors.</Paragraph>
    <Paragraph position="1"> This method has some inherent limitations because not all errors in the corpus would manifest themselves as anomalies. In infrequent contexts or ambiguous situations, the method may not have enough information to detect an error.</Paragraph>
    <Paragraph position="2"> In addition, if there are inconsistencies between annotators, the method would not detect the errors because the errors would be manifested over a significant portion of the corpus.</Paragraph>
    <Paragraph position="3"> Although this paper presents a fully automatic method for error detection in corpora, this method can also be used as a semi-automatic method for correcting errors. The method can guide an annotator to the elements which are most likely errors. The method can greatly reduce the number of elements that an annotator needs to examine.</Paragraph>
    <Paragraph position="4"> Future work in this area involves modeling the corpora with other probability distributions.  The method is very sensitive to the effectiveness of the probability model in modeling the normal elements. Extensions to the probability distributions presented here such as adding information about endings of words or using more features could increase the accuracy of the probability distribution and the overall performance of the anomaly detection system. Other future work involves applying this method to other marked corpora.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML