File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/02/c02-1101_abstr.xml
Size: 1,075 bytes
Last Modified: 2025-10-06 13:42:19
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1101"> <Title>Detecting Errors in Corpora Using Support Vector Machines</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> While the corpus-based research relies on human annotated corpora, it is often said that a non-negligible amount of errors remain even in frequently used corpora such as Penn Treebank.</Paragraph> <Paragraph position="1"> Detection of errors in annotated corpora is important for corpus-based natural language processing. In this paper, we propose a method to detect errors in corpora using support vector machines (SVMs). This method is based on the idea of extracting exceptional elements that violate consistency. We propose a method of using SVMs to assign a weight to each element and to find errors in a POS tagged corpus.</Paragraph> <Paragraph position="2"> We apply the method to English and Japanese POS-tagged corpora and achieve high precision in detecting errors.</Paragraph> </Section> class="xml-element"></Paper>