File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/02/c02-1101_abstr.xml

Size: 1,075 bytes

Last Modified: 2025-10-06 13:42:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1101">
  <Title>Detecting Errors in Corpora Using Support Vector Machines</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> While the corpus-based research relies on human annotated corpora, it is often said that a non-negligible amount of errors remain even in frequently used corpora such as Penn Treebank.</Paragraph>
    <Paragraph position="1"> Detection of errors in annotated corpora is important for corpus-based natural language processing. In this paper, we propose a method to detect errors in corpora using support vector machines (SVMs). This method is based on the idea of extracting exceptional elements that violate consistency. We propose a method of using SVMs to assign a weight to each element and to find errors in a POS tagged corpus.</Paragraph>
    <Paragraph position="2"> We apply the method to English and Japanese POS-tagged corpora and achieve high precision in detecting errors.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML